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COMPOSITIONS AND METHODS RELATING TO 
DNA MISMATCH REPAIR GENES 

This invention was made with government support under Agreement 
No. GM 32741 and Agreement No. HG00395/GM50006 awarded by the National 
Institute of Health in the General Sciences Division. The government has certain 
rights in the invention. 

This application is a continuation-in-part from U.S. Patent 
Application Serial No. 08/209,521, titled: MAMMALIAN DNA MISMATCH 
REPAIR GENES PMS1 AND MLHl t filed on March 8, 1994, which is a 
continuation-in-part from U.S. Patent Application Serial No. 08/168,877, filed on 
December 17, 1993. All of the above patent applications are incorporated by 
reference. 

Field of the Invention 
The present invention involves DNA mismatch repair genes. In 
particular, the invention relates to identification of mutations and polymorphisms 
in DNA mismatch repair genes, to identification and characterization of DNA 
mismatch-repair-defective tumors, and to detection of genetic susceptibility to 
cancer. 

Background 

In recent years, with the development of powerful cloning and 
amplification techniques such as the polymerase chain reaction (PCR), in 
combination with a rapidly accumulating body of information concerning the 
structure and location of numerous human genes and markers, it has become 
practical and advisable to collect and analyze samples of DNA or RNA from 
individuals who are members of families which are identified as exhibiting a high 
frequency of certain genetically transmitted disorders. For example, screening 
procedures are routinely used to screen for genes involved in sickle cell anemia, 
cystic fibrosis, fragile X chromosome syndrome and multiple sclerosis. For some 
types of disorders, early diagnosis can greatly improve the person's long-term 
prognosis by, for example, adopting an aggressive diagnostic routine, and/or by 
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making life style changes if appropriate to either prevent or prepare for an 
anticipated problem. 

Once a particular human gene mutation is identified and linked to 
a disease, development of screening procedures to identify high-risk individuals 
5 can be relatively straight forward. For example* after the structure and abnormal ► 

phenotypic role of the mutant gene are understood, it is possible to design ' 
primers for use in PCR to obtain amplified quantities of the gene from individuals 
for testing. However, initial discovery of a mutant gene, Le., its structure, location 
and linkage with a known inherited health problem, requires substantial 

10 experimental effort and creative research strategies. 

One approach to discovering the role of a mutant gene in causing 
a disease begins with clinical studies on individuals who are in families which 
exhibit a high frequency of the disease. In these studies, the approximate location 
of the disease-causing locus is determined indirectly by searching for a 

15 chromosome marker which tends to segregate with the locus. A principal 

limitation of this approach is that, although the approximate geiiomic location of 
■ the gene can be determined, it does not generally allow actual isolation or 
sequencing of the gene. For example, Lindblom et al; 3 reported results of linkage 
analysis studies performed with SSLP (simple sequence length polymorphism) 

20 markers on individuals from a family known to exhibit a high incidence of 

hereditary non-polyposis colon cancer (HNPCC). Lindblom et al found a "tight 
linkage" between a polymorphic marker on the short arm of human chromosome 
3 (3p21-23) and a disease locus apparently responsible for increasing an 
individual's risk of developing colon cancer. Even though 3p21-23 is a fairly 

25 specific location relative to the entire genome, it represents a huge DNA region 

relative to the probable size of the mutant gene. The mutant gene could be 
separated from the markers identifying the locus by millions of bases. At best, 
such linkage studies have only limited utility for screening purposes because in 
order to predict one person's risk, genetic analysis must be performed with tightly 

30 linked genetic markers on a number of related individuals in the family. It is 

often impossible to obtain such information, particularly if affected family 
members are deceased. Also, informative markers may not exist in the family 
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under analysis. Without knowing the gene's structure, it is not possible to sample, 
amplify, sequence and determine directly whether an individual carries the mutant 
gene. 

Another approach to discovering a disease-causing mutant gene 
^ 5 begins with design and trial of PCR primers, based on known information about 

the disease, for example, theories for disease state mechanisms, related protein 
structures and function, possible analogous genes in humans or other species, etc. 
The objective is to isolate and sequence candidate normal genes which are 
believed to sometimes occur in mutant forms rendering an individual disease 

10 prone. This approach is highly dependent on how much is known about the 

disease at the molecular level, and on the investigator's ability to construct 
strategies and methods for finding candidate genes. Association of a mutation in 
a candidate gene with a disease must ultimately be demonstrated by performing 
tests on members of a family which exhibits a high incidence of the disease. The 

15 most direct and definitive way to confirm such linkage in family studies is to use 

PCR primers which are designed to amplify portions of the candidate gene in 
samples collected from the family members. The amplified gene products are 
then sequenced and compared to the normal gene structure for the purpose of 
finding and characterizing mutations. A given mutation is ultimately implicated 

20 by showing that affected individuals have it while unaffected individuals do not, 

and that the mutation causes a change in protein function which is not simply a 
polymorphism. 

Another way to show a high probability of linkage between a 
candidate gene mutation and disease is by determining the chromosome location 
25 of the gene, then comparing the gene's map location to known regions of disease- 

linked loci such as the one identified by Lindblom et al. Coincident map location 
of a candidate gene in the region of a previously identified disease-linked locus 
may strongly implicate an association between a mutation in the candidate gene 
and the disease. 

30 There are other ways to show that mutations in a gene candidate 

may be linked to the disease. For example, artificially produced mutant forms of 
the gene can be introduced into animals. Incidence of the disease in animals 
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carrying the mutant gene can then be compared to animals with the normal 
genotype. Significantly elevated incidence of disease in animals with the mutant 
genotype, relative to animals with the wild-type gene, may support the theoiy that 
mutations in the candidate gene are sometimes responsible for occurrence of the 
5 disease. 

One type of disease which has recently received much attention 
because of the discovery of disease-linked gene mutations is Hereditary 
Nonpolyposis Colon Cancer (HNPCC). 1 ' 2 Members of HNPCC families also 
display increased susceptibility to other cancers including endometrial, ovarian, 

10 gastric and breast. Approximately 10% of colorectal cancers are believed to be 

HNPCC. Tumors from HNPCC patients display an unusual genetic defect in 
which short, repeated DNA sequences, such as the dinucleotide repeat sequences 
found in human chromosomal DNA ("microsatellite DNA"), appear to be 
unstable. This genomic instability of short, repeated DNA sequences, sometimes 

15 . called the M RER+" phenotype, is also observed in a significant proportion of a 
wide variety of sporadic tumors, suggesting that many sporadic tumors may have 
acquired mutations that are similar (or identical) to mutations that are inherited 
in HNPCC. 

Genetic linkage studies have identified two HNPCC loci thought to 
20 account for as much as 90% of HNPCC. The loci map to human chromosome 

2pl5-16 (2p21) and 3p21-23. Subsequent studies have identified human DNA 
mismatch repair gene hMSH2 as being the gene on chromosome 2p21, in which 
mutations account for a significant fraction of HNPCC cancers. 1, x 12 hMSH2 is 
one of several genes whose normal function is to identify and correct DNA 
25 mispairs including those that follow each round of chromosome replication. 

The best defined mismatch repair pathway is the E.coli MutHLS 
pathway that promotes a long-patch (approximately 3Kb) excision repair reaction 
which is dependent on the mutH, mutL, mutS and mutU (uvrD) gene products. 
The MutHLS pathway appears to be the most active mismatch repair pathway in 
30 E.coli and is known to both increase the fidelity of DNA replication and to act on 

recombination intermediates containing mispaired bases. The system has been 
reconstituted in vitro> and requires the mutH, mutL, mutS and uvrD (helicase II) 
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proteins along with DNA polymerase III holoenzyme, DNA ligase, single-stranded 
DNA binding protein (SSB) and one of the single-stranded DNA exonucleases, 
Exo I, Exo VII or Red. hMSH2 is homologous to the bacterial mutS gene. A 
similar pathway in yeast includes the yeast MSH2 gene and two mufL-like genes 
referred to as PMS1 and MLH1. 

With the knowledge that mutations in a human mutS type gene 
(hMSH2) sometimes cause cancer, and the discovery that HNPCC tumors exhibit 
microsatellite DNA instability, interest in other DNA mismatch repair genes and 
gene products, and their possible roles in HNPCC and/or other cancers, has 
intensified. It is estimated that as many as 1 in 200 individuals carry a mutation 
in either the hMSH2 gene or other related genes which encode for other proteins 
in the same DNA mismatch repair pathway. 

An important objective of our work has been to identify human 
genes which are useful for screening and identifying individuals who are at 
elevated risk of developing cancer. Other objects are: to determine the 
sequences of exons and flanking intron structures in such genes; to use the 
structural information to design testing procedures for the purpose of finding and 
characterizing mutations which result in an absence of or defect in a gene product 
which confers cancer susceptibility; and to distinguish such mutations from 
"harmless" polymorphic variations. Another object is to use the structural 
information relating to exon and flanking intron sequences of a cancer-linked 
gene, to diagnose tumor types and prescribe appropriate therapy. Another object 
is to use the structural information relating to a cancer-linked gene to identify 
other related candidate human genes for study. 

Summary of the Invention 
Based on our knowledge of DNA mismatch repair mechanisms in 
bacteria and yeast including conservation of mismatch repair genes, we reasoned 
that human DNA mismatch repair homologs should exist, and that mutations in 
such homologs affecting protein function, would be likely to cause genetic 
instability, possibly leading to an increased risk of developing certain forms of 
human cancer. 
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We have isolated and sequenced two human genes, hPMSl and 
HMLH1 each of which encodes for a protein involved in DNA mismatch repair. 
hPMSl and hMLHl are homologous to mutL genes found in E.colL Our studies 
strongly support an association between mutations in DNA mismatch repair genes 
and susceptibility to HNPCC. Thus, DNA mismatch repair gene sequence 
information of the present invention, namely, cDNA and genomic structures 
relating to hMLHl and hPMSl, make possible a number of useful methods 
relating to cancer risk determination and diagnosis. The invention also 
encompasses a large number of nucleotide and protein structures which are useful 
in such methods. 

We mapped the location of hMLHl to human chromosome 3p21-23. 
This is a region of the human genome that, based upon family studies, harbors a 
locus that predisposes individuals to HNPCC. Additionally, we have found a 
mutation in a conserved region of the hMLHl cDNA in HNPCC-affected 
individuals from a Swedish family. The mutation is not found in unaffected 
individuals from the same family, nor is it a simple polymorphism. We have also 
found that a homologous mutation in yeast results in a defective DNA mismatch 
repair protein. We have also found a frameshift mutation in hMLHl of affected 
individuals from an English family. Our discovery of a cancer-linked mutations 
in hMLHl, combined with the gene's map position which is coincident with a 
previously identified HNPCC-linked locus, plus the likely role of the hMLHl gene 
in mutation avoidance makes the hMLHl gene a prime candidate for underlying 
one form of common inherited human cancer, and a prime candidate to screen 
and identify individuals who have an elevated risk of developing cancer. 

hMLHl has 19 exons and 18 introns. We have determined the 
location of each of the 18 introns relative to hMLHl cDNA. We have also 
determined the structure of all intron/exon boundary regions of hMLHl. 
Knowledge of the intron/exon boundary structures makes possible efficient 
screening regimes to locate mutations which negatively affect the structure and 
function of gene products. Further, we have designed complete sets of 
oligonucleotide primer pairs which can be used in PGR to amplify individual 
complete exons together with surrounding intron boundary structures. 
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We mapped the location of hPMSl to human chromosome 7. 
Subsequent studies by others 39 have confirmed our prediction that mutations in 
this gene are linked to HNPCC. 

The most immediate use of the present invention will be in 
5 screening tests on human individuals who are members of families which exhibit 

an unusually high frequency of early onset cancer, for example HNPCC 
Accordingly, one aspect of the invention comprises a method of diagnosing cancer 
susceptibility in a subject by detecting a mutation in a mismatch repair gene or 
gene product in a tissue from the subject, wherein the mutation is indicative of 

10 the subject's susceptibility to cancer. In a preferred embodiment of the invention, 

the step of detecting comprises detecting a mutation in a human mutL homolog 
gene, for example, hMLHl oihPMSL 

The method of diagnosing preferably comprises the steps of: 1) 
amplifying a segment of the mismatch repair gene or gene product from an 

15 isolated nucleic acid; 2) comparing the amplified segment with an analogous 

segment of a wild-type allele of the mismatch repair gene or gene product; and 
3) detecting a difference between the amplified segment and the analogous 
segment, the difference being indicative of a mutation in the mismatch repair 
gene or gene product which confers cancer susceptibility. 

20 Arfother aspect of the invention provides methods of determining 

whether the difference between the amplified segment and the analogous wild- 
type segment causes an affected phenotype, i.e., does the sequence alteration 
affect the individual's ability to repair DNA mispairs. 

The method of diagnosing may include the steps of: 1) reverse 

25 transcribing all or a portion of an RNA copy of a DNA mismatch repair gene; 

and 2) amplifying a segment of the DNA produced by reverse transcription. An 
amplifying step in the present invention may comprise: selecting a pair of 
oligonucleotide primers capable of hybridizing to opposite strands of the 
mismatch repair gene, in an opposite orientation; and performing a polymerase 

30 chain reaction utilizing the oligonucleotide primers such that nucleic acid of the 

mismatch repair chain intervening between the primers is amplified to become 
the amplified segment. 



WO 95/16793 PCT/US94/14746 

8 

In preferred embodiments of the methods summarized above, the 
DNA mismatch repair gene is hMLHl or hPMSL The segment of DNA 
corresponds to a unique portion of a nucleotide sequence selected from the group 
consisting of SEQ ID NOS: 6-24. "First stage* oligonucleotide primers selected 
from the group consisting of SEQ ID NOS: 44-82 are used in PGR to amplify the 
DNA segment are . The invention also provides a method of using "second stage" 
nested primers (SEQ ID NOS: 83-122), for use with the first stage primers to 
allow more specific amplification and conservation of template DNA. 

Another aspect of the present invention provides a method of 
identifying and classifying a DNA mismatch repair defective tumor comprising 
detecting in a tumor a mutation in a mismatch repair gene or gene product, 
preferably a mutL homolog {hMLHl or hPMSl) y the mutation being indicative 
of a defect in a mismatch repair system of the tumor. 

The present invention also provides useful nucleotide and protein 
compositions. One such composition is an isolated nucleotide or protein structure 
including a segment sequentially corresponding to a unique portion of a human 
mutL homolog gene or gene product, preferably derived from either hMLHl or 
hPMSL 

Other composition aspects of the invention comprise oligonucleotide 
primers capable of being used together in a polymerase chain reaction to amplify 
specifically a unique segment of a human mutL homolog gene, preferably hMLHl 
or hPMSl. 

Another aspect of the present invention provides a probe including 
a nucleotide sequence capable of binding specifically by Watson/Crick pairing to 
complementary bases in a portion of a human mutL homolog gene; and a label- 
moiety attached to the sequence, wherein the label-moiety has a property selected 
from the group consisting of fluorescent, radioactive and chemiluminescent. 

We have also isolated and sequenced mouse MLH1 (mMLHl) and 
PMS1 (mPMSl) genes. We have used our knowledge of mouse mismatch repair 
genes to construct animal models for studying cancer. The models will be useful 
to identify additional oncogenes and to study environmental effects on 
mutagenesis. 
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We have produced polyclonal antibodies directed to a portion of the 
protein encoded by mPMSl cDNA. The antibodies also react with hPMSl 
protein and are useful for detecting the presence of the protein encoded by a 
normal hPMSl gene. We are also producing monoclonal antibodies directed to 
hMLHl and hPMSl. 

In addition to diagnostic and therapeutic uses for the genes, our 
knowledge of hMLHl and hPMSl can be used to search for other genes of 
related function which are candidates for playing a role in certain forms of human 
cancer. 



Description of the Figures 
Figure 1 is a flow chart showing an overview of the sequence of 
experimental steps we used to isolate, characterize and use human and mouse 
PMS1 and MLH1 genes. 
15 Figure 2 is an alignment of protein sequences for mutL homologs 

(SEQ ID NOS: 1*3) showing two highly-conserved regions (underlined) which we 
• used to create degenerate PCR oligonucleotides for isolating additional mutL 
homologs. 

Figure 3 shows the entire cDNA nucleotide sequence (SEQ ID NO: 
20 4) for the human MLH1 gene, and me corresponding predicted amino acid 

sequence (SEQ ID NO: 5) for the human MLH1 protein. The underlined DNA 
sequences are the regions of cDNA that correspond to the degenerate PCR 
primers that were originally used to amplify a portion of the MLH1 gene 
(nucleotides 118-135 and 343-359). 
25 Figure 4A shows the nucleotide sequences of the 19 exons which 

collectively correspond to the entire hMLHl cDNA structure. The exons are 
flanked by intron boundary structures. Primer sites are underlined. The exons 
with their flanking intron structures correspond to SEQ ID NOS: 6-24. The 
exons, shown in non-underlined small case letters, corespond to SEQ ID NOS: 
30 25-43. 

Figure 4B shows nucleotide sequences of primer pairs which have 
been used in PCR to amplify the individual exons. The "second stage" 
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amplification primers (SEQ ID NOS: 83-122) are "nested" primers which are 
used to amplify target exons from the amplification product obtained with 
corresponding "first stage" amplification primers (SEQ ID NOS: 44-82). The 
structures in Figure 4B correspond to the structures in Tables 2 and 3. 
5 Figure 5 is an alignment of the predicted amino acid sequences for 

human and yeast (SEQ ID NOS: 5 and 123, respectively) MLH1 proteins. 
Amino acid identities are indicated by boxes and gaps are indicated by dashes. 

Figure 6 is a phylogenetic tree of MutL-related proteins. 

Figure 7 is a two-panel photograph. The first panel (A) is a 
10 metaphase spread showing hybridization of the hMLHl gene of chromosome 3. 

The second panel (B) is a composite of chromosome 3 from multiple metaphase 
spreads aligned with a human chromosome 3 ideogram. The region of 
hybridization is indicated in the ideogram by a vertical bar. 

Figure 8 is a comparison of sequence chromatograms from affected 
15 and unaffected individuals showing identification of a C to T transition mutation 

that produces a non-conservative amino acid substitution at position 44 of the 
hMLHl protein. 

Figure '9 is an amino acid sequence alignment (SEQ ID NOS: 124- 
131) of the highly-conserved region of the MLH family of proteins surrounding 

20 the site of the predicted amino acid substitution. Bold type indicates the position 

of the predicted serine to phenylalanine amino acid substitution in affected 
individuals. Also highlighted are the serine or alanine residues conserved at this 
position in MutL-like proteins. Bullets indicate positions of highest amino acid 
conservation. For the MLH1 protein, the dots indicate that the sequence has not 

25 been obtained. Sequences were aligned as described below in reference to the 

phylogenetic tree of Figure 6. 

Figure 10 shows the entire nucleotide sequence for hPMSl (SEQ 
ID NO: 132). 

Figure 11 is an alignment of the predicted amino acid sequences for 
30 human and yeast PMS1 proteins (SEQ ID NOS: 133 and 134, respectively). 

Amino acid identities are indicated by boxes and gaps are indicated by dashes. 
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Figure 12 is a partial nucleotide sequence of mouse MLH1 
mMLHl) cDNA (SEQ ID NO: 135). 

Figure 13 is a comparison of the predicted amino acid sequence for 
mMLHl and hMLHl proteins (SEQ ID NOS: 136 and 5, respectively). 

Figure 14 shows the cDNA nucleotide sequence for mouse PMS1 
(mPMSl) (SEQ ID NO: 137). 

Figure 15 is a comparison of the predicted amino acid sequences 
for mPMSl and hPMSl proteins (SEQ ID NOS: 138 and 133, respectively). 

Definitions 

gene - "Gene" means a nucleotide sequence that contains a complete coding 
sequence. Generally, "genes" also include nucleotide sequences found upstream 
(e.g. promoter sequences, enhancers, etc.) or downstream (e.g. transcription 
termination signals, polyadenylation sites, etc.) of the coding sequence that affect 
the expression of the encoded polypeptide. 

gene product - A "gene product" is either a DNA or RNA (mRNA) copy of a 
portion of a gene, or a corresponding amino acid sequence translated from 
mRNA. 

wild-type - The term "wild-type", when applied to nucleic acids and proteins of 
the present invention, means a version of a nucleic acid or protein that functions 
in a manner indistinguishable from a naturally-occurring, normal version of that 
nucleic acid or protein (Le. a nucleic acid or protein with wild-type activity). For 
example, a "wild-type" allele of a mismatch repair gene is capable of functionally 
replacing a normal, endogenous copy of the same gene within a host cell without 
detectably altering mismatch repair in that cell. Different wild-type versions of 
the same nucleic acid or protein may or may not differ structurally from each 
other. 

non-wild-type - The term "non-wild-type" when applied to nucleic acids and 
proteins of the present invention, means a version of a nucleic acid or protein that 
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functions in a manner distinguishable from a naturally-occurring, normal version 
of that nucleic acid or protein. Non-wild-type alleles of a nucleic acid of the 
invention may differ structurally from wild-type alleles of the same nucleic acid 
in any of a variety of ways including, but not limited to, differences in the amino 
acid sequence of an encoded polypeptide and/or differences in expression levels 
of an encoded nucleotide transcript of polypeptide product 

For example, the nucleotide sequence of a non-wild-type allele of 
a nucleic acid of the invention may differ from that of a wild-type allele by, for 
example, addition, deletion, substitution, and/or rearrangement of nucleotides. 
Similarly, the amino acid sequence of a non-wild-type mismatch repair protein 
may differ from that of a wild-type mismatch repair protein by, for example, 
addition, substitution, and/or rearrangement of amino acids. 

Particular non-wild-type nucleic acids or proteins that, when 
introduced into a normal host cell, interfere with the endogenous mismatch repair 
pathway, are termed "dominant negative" nucleic acids or proteins. 

homologous - The term "homologous" refers to nucleic acids or polypeptides that 
are highly related, at the level of nucleotide or amino acid sequence. Nucleic 
acids or polypeptides th are homologous to each other are termed 
"homologues". 

The term "homologous" necessarily refers to a comparison between 
two sequences. In accordance with the invention, two nucleotide sequences are 
considered to be homologous if the polypeptides they encode are at least about 
50-60% identical, preferably about 70% identical, for at least one stretch of at 
least 20 amino acids. Preferably, homologous nucleotide sequences are also 
characterized by the ability to encode a stretch of at least 4-5 uniquely specified 
amino acids. Both the identity and the approximate spacing of these amino acids 
relative to one another must be considered for nucleotide sequences to be 
considered to be homologous. For nucleotide sequences less than 60 nucleotides 
in length, homology is determined by the ability to encode a stretch of at least 4-5 
uniquely specified amino acids. 
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upstream/downstream - The terms "upstream" and "downstream" are art- 
understood terms referring to the position of an element of nucleotide sequence. 
"Upstream" signifies an element that is more 5' than the reference element. 
"Downstream" refers to an element that is more 3' than a reference element. 

intron/exon - The terms "exon" and "intron" are art-understood terms referring 
to various portions of genomic gene sequences. "Exons" are those portions of a 
genomic gene sequence that encode protein. "Introns" are sequences of 
nucleotides found between exons in genomic gene sequences. 

affected - The term "affected", as used herein, refers to those members of a 
kindred that either have developed a characteristic cancer (e.g. colon cancer in 
an HNPCC lineage) and/or are predicted, on the basis of, for example, genetic 
studies, to carry an inherited mutation that confers susceptibility to cancer. 

unique - A "unique" segment, fragment or portion of a gene or protein means a 
portion of a gene or protein which is different sequentially from any other gene 
or protein segment in an individual's genome. As a practical matter, a unique 
segment or fragment of a gene will typically be a nucleotide of at least about 13 
bases in length and will be sufficiently different from other gene segments so that 
oligonucleotide primers may be designed and used to selectively and specifically 
amplify the segment. A unique segment of a protein is typically an amino acid 
sequence which can be translated from a unique segment of a gene. 
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Description of th e Invention 

We have discovered mammalian genes which are involved in DNA 
mismatch repair. One of the genes, hPMSU encodes a protein which is 
homologdus to the yeast DNA mismatch repair protein PMS1. We have mapped 
the locations of hPMSl to human chromosome .7 and the mouse PMS1 gene to 
mouse chromosome 5, band G. Another gene, hMLHl (MutL Homolog) encodes 
a protein which is homologous to the yeast DNA mismatch repair protein Iz'LHl. 
We have mapped the locations of hMLHl to human chromosome 3p21.. 23 and 
to mouse chromosome 9, band E. 

Studies 1,2 have demonstrated involvement of a human DNA 
mismatch repair gene homolog, hMSH2, on chromosome 2p in HNPCC. Based 
upon linkage data, a second HNPCC locus has been assigned to chromosome 
3p21-23. 3 Examination of tumor DNA from the chromosome 3-linked kindreds 
revealed dinucleotide repeat instability similar to that observed for other HNPCC 
families 6 and several types of sporadic tumors. 7 " 10 Because dinucleotide repeat 
instability is characteristic of a defect in DNA mismatch repair, 5l 12 we 
reasoned that HNPCC linked to chromosome 3p21-23 could result from a 
mutation in a second DNA mismatch repair gene. 

Repair of mismatched DNA in Escherichia coli requires a number 
of genes including mutS> mutL and mutH, defects in any one of which result in 
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elevated spontaneous mutation rates. 13 Genetic analysis in the yeast 
Saccharomyces cerevisiae has identified three DNA mismatch repair genes: a mutS 
homolog, MSH2, 14 and two mutL homologs, PMS1 16 and MLHL 4 Each of these 
three genes play an indispensable role in DNA replication fidelity, including the 
5 stabilization of dinucleotide repeats. 5 

We believe that hMLHl is the HNPCC gene previously linked to 
chromosome 3p based upon the similarity of the hMLHl gene product to the 
yeast DNA mismatch repair protein, MLH1, 4 the coincident location of the 
hMLHl gene and the HNPCC locus on chromosome 3, and hMLHl missense 
10 mutations which we found in affected individuals from chromosome 3-linked 

HNPCC families. 

Our knowledge of the human and mouse MLH1 and PMS1 gene 
structures has many important uses. The gene sequence information can be used 
to screen individuals for cancer risk. Knowledge of the gene structures makes it 
15 possible to* easily design PGR primers which can be used to selectively amplify 

portions of hMLHl and hPMSl genes for subsequent comparison to the normal 
sequence and cancer risk analysis. This type of testing also makes it possible to 
search for and characterize hMLHl and hPMSl cancer-linked mutations for the 
purpose of eventually focusing the cancer screening effort on specific gene loci. 
20 Specific characterization of cancer-linked mutations in hMLHl and hPMSl makes 

possible the production of other valuable diagnostic tools such as allele specific 
probes which may be used in screening tests to determine the presence or absence 
of specific gene mutations. 

Additionally, the gene sequence information for hMLHl and/or 
25 hPMSl can be used, for example, in a two hybrid system, to search for other 

genes of related function which are candidates for cancer involvement. 

The hMLHl and hPMSl gene structures are useful for making 
proteins which are used to develop antibodies directed to specific portions or the 
complete hMLHl and hPMSl proteins. Such antibodies can then be used to 
30 isolate the corresponding protein and possibly related proteins for research and 

diagnostic purposes. 
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The mouse MLH1 and PMS1 gene sequences are useful for 
producing mice that have mutations in the respective gene. The mutant mice are 
useful for studying the gene's function, particularly its relationship to cancer. 

Methods for Isolating and Characterizing 
Mammalian MLH1 and PMS1 Genes 
We have isolated and characterized four mammalian genes, i.e., 
human MLH1 (hMLHl\ human PMS1 {hPMSl\ mouse MLH1 (mPMSl) and 
mouse PMS1 (mPMSl). Due to the structural similarity between these genes, the 
methods we have employed to isolate and characterize them are generally the 
same. Figure 1 shows in broad terms, the experimental approach which we used 
to isolate and characterize the four genes. The following discussion refers to the 
step-by-step procedure shown in Figure 1. 

Step 1 Design of degenerate oligonucleotide pools for PCR 

Earlier reports indicated that portions of three MutL-Iike proteins, 
two from bacteria, MutL and HexB, and one from yeast, PMS1 are highly 
conserved. 164849 After inspection of the amino acid sequences of HexB, MutL and 
PMS1 proteins, as shQwn in Figure 2, we designed pools of degenerate 
oligonucleotide pairs corresponding to two highly-conserved regions, KELVEN 
and GFRGEA, of the MutL-like proteins. The sequences (SEQ ID NOS: 139 
and 140, respectively) of the degenerate oligonucleotides which we used to isolate 
the four genes are: 

5'-CTTG A TTCTAGA GCfT/ C)TCNCCNC(T/ G)( A/G) AANCC-T and 
5 > -AGGTCG GAGCTCA AfA/G^GAf;A/GVT/C^TNGTNGANAA^ > . 
The underlined sequences within the primers are Xbal and Sad restriction 
endonuclease sites respectively. They were introduced in order to facilitate the 
cloning of the PCR-amplified fragments. In the design of the oligonucleotides, 
we took into account the fact that a given amino acid can be coded for by more 
than one DNA triplet (codon). The degeneracy within these sequences are 
indicated by multiple nucleotides within parentheses or N, for the presence of any 
base at that position. 
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Step 2 Reverse transcription and PCR on poly A+ selected mRNA 

isolated from human cells 
We isolated messenger (poly A+ enriched) RNA from cultured 
human cells, synthesized double-stranded cDNA from the mRNA, and performed 
PCR with the degenerate oligonucleotides. 4 After trying a number of different 
PCR conditions, for example, adjusting the annealing temperature, we successfully 
amplified a DNA of the size predicted (~210bp) for a MutL-like protein. 
Step 3 Cloning and sequencing of PCR-generated fragments; 

identification of two gene fragments representing human 

PMS1 and MLH1 
We isolated the PCR amplified material (~210bp) from an agarose 
gel and cloned this material into a plasmid (pUC19), We determined the DNA 
sequence of several different clones- The amino acid sequence inferred from the 
DNA sequence of two clones showed strong similarity to other known MutL-like 
proteins. 4 ' 16,18,19 The predicted amino acid sequence for one of the clones was 
most similar to the yeast PMS1 protein. Therefore we named it hPMSl, for 
human PMS1. The second clone was found to encode a polypeptide that most 
closely resembles yeast MLH1 protein and was named, hMLHl y for human 
MLHL 

Step 4 1 " Isolation of complete human and mouse PMS1 and MLH1 

cDNA clones using the PCR fragments as probes 
We used the 210bp PCR-generated fragments of the hMLHl and 
hPMSl cDNAs, as probes to screen both human and mouse cDNA libraries (from 
Stratagene, or as described in reference 30). A number of cDNAs were isolated 
that corresponded to these two genes. Many of the cDNAs were truncated at the 
5' end. Where necessary, PCR techniques 31 were used to obtain the 5* -end of 
the gene in addition to further screening of cDN A libraries. Complete composite 
cDNA sequences were used to predict the amino acid sequence of the human and 
mouse, MLH1 and PMS1 proteins. 
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Step 5 Isolation of human and mouse, PMS1 and MLH1 genomic 

clones 

Information on genomic and cDNA structure of the human MLH1 
and PMS1 genes are necessary in order to thoroughly screen for mutations in 
5 cancer prone families. We have used human cDNA sequences as probes to 

isolate the genomic sequences of human PMS1 and MLHL We have isolated 
four cosmids and two PI clones for hPMSl, that together are likely to contain 
most, if not all, of the cDNA (exon) sequence. For hMLHl we have isolated four 
overlapping A-phage clones containing 5*-MLHl genomic sequences and four PI 

10 clones (two full length clones and two which include the 5' coding end plus 

portions of the promoter region) PI clone. PCR analysis using pairs of 
oligonucleotides specific to the 5' and 3' ends of the hMLHl cDNA, clearly 
indicates that the PI clone contains the complete hMLHl cDNA information. 
, Similarly, genomic clones for mouse PMS1 and MLH1 genes have been isolated 

15 and partially characterized (described in Step 8). 

Step 6 Chromosome positional mapping of the human and mouse, 

PMS1 and MLH1 genes by fluorescence in situ hybridization 
We used genomic clones isolated from human and mouse PMS1 and 
MLH1 for chromosomal localization by fluorescence in situ hybridization 

20 (FISH). 20 ' 21 We mapped the human MLH1 gene to chromosome 3p21.3-23, shown 

in Figure 7 as discussed in more detail below. We mapped the mouse MLH1 
gene to chromosome 9 band E, a region of synteny between mouse and human. 22 
In addition to FISH techniques, we used PCR with a pair of hMLHl -specific 
oligonucleotides to analyze DNA from a rodent/human somatic cell hybrid 

25 mapping panel (Coriell Institute for Medial Research, Camden, N J.). Our PCR 

results with the panel clearly indicate that hMLHl maps to chromosome 3. The 
position of hMLHl 3p21.3-23 is coincident to a region known to harbor a second 
locus for HNPCC based upon linkage data. 

We mapped the hPMSl gene, as shown in Figure 12, to the long (q) 

30 arm of chromosome 7 (either 7qll or 7q22) and the mouse PMS1 to chromosome 

5 band G, two regions of synteny between the human and the mouse. 22 We 
performed PCR using oligonucleotides specific to hPMSl on DNA from a 
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rodent/human cell panel. In agreement with the FISH data, the location of 
hPMSl was confirmed to be on chromosome 7. These observations assure us that 
our human map position for hPMSl to chromosome 7 is correct. The physical 
localization of hPMSl is useful for the purpose of identifying families which may 
potentially have a cancer linked mutation in hPMSl. 

Step 7 Using genomic and cDNA sequences to identify mutations 

in hPMSl and hMLHl genes from HNPCC Families 

We have analyzed samples collected from individuals in HNPCC 
families for the purpose of identifying mutations in hPMSl or hMLHl genes. Our 
approach is to design PGR primers based on our knowledge of the gene 
structures, to obtain exon/intron segments which we can compare to the known 
normal sequences. We refer to this approach as an "exon-screening". 

Using cDNA sequence information we have designed and are 
continuing to design hPMSl and hMLHl specific oligonucleotides to delineate 
exon/intron boundaries within genomic sequences. The hPMSl and hMLHl 
specific oligonucleotides were used to probe genomic clones for the presence of 
exons containing that sequence. Oligonucleotides that hybridized were used as 
primers for DNA sequencing from the genomic clones. Exon-intron junctions 
were identified by comparing genomic with cDNA sequences. 

Amplification of specific exons from genomic DNA by PCR and 
sequencing of the products is one method to screen HNPCC families for 
mutations. 1,2 We have identified genomic clones containing hMLHl cDNA 
information and have determined the structures of all intron/exon boundary 
regions which flanks the 19 exons of hMCHl. 

We have used the exon-screening approach to examine the MLH1 
gene of individuals from HNPCC families showing linkage to chromosome 3. 3 As 
will be discussed in more detail below, we identified a mutation in the MLH1 
gene of one such family, consisting of a C to T substitution. We predict that the 
C to T mutation causes a serine to phenylalanine substitution in a highly- 
conserved region of the protein. We are continuing to identify HNPCC families 
from whom we can obtain samples in order to find additional mutations' in 
hMLHl and hPMSl genes. 
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We are also using a second approach to identify mutations in 
hPMSl and hMLHl. The approach is to design hPMSl or hMLHl specific 
oligonucleotide primers to produce first-strand cDNA by reverse transcription off 
RNA PCR using gene-specific primers will allow us to amplify specific regions 
from these genes, DNA sequencing of the amplified fragments will allow us to 
detect mutations. 

Step 8 Design targeting vectors to disrupt mouse PMS1 and MLH1 

genes in ES cells; study mice deficient in mismatch repair. 
We constructed a gene targeting vector based on our knowledge of 
the genomic mouse PMS1 DNA structure. We used the vector to disrupt the 
PMS1 gene in mouse embryonic stem cells. 36 The cells were injected into mouse 
blastocysts which developed into mice that are chimeric (mixtures) for cells 
carrying the PMS1 mutation. The chimeric animals will be used to breed mice 
that are heterozygous and homozygous for the PMS1 mutation. These mice will 
be useful for studying the role of the PMS1 gene in the whole organism. 

Human MLH1 

The following discussion is a more detailed explanation of our 
experimental work relating to hMLHl. As mentioned above, to clone mammalian 
MLF ;^nes, we used PCR techniques like those used to identify the yeast MSH1, 
MSH2 and MLH1 genes and the human MSH2 gene. 1, 2 - 4, 14 As template in the 
PCR, we used double-stranded cDNA synthesized from poly (A + ) enriched RNA 
prepared from cultured primary human fibroblasts. The degenerate 
oligonucleotides were targeted at the N-terrainal amino acid sequences KELVEN 
and GFRGEA (see Figure 3), two of the most conserved regions of the MutL 
family of proteins previously described for bacteria and yeast. 16 * 18,19 Two PCR 
products of the predicted size were identified, cloned and shown to encode a 
predicted amino acid sequence with homology to MutL-like proteins. These two 
fragments generated by PCR were used to isolate human cDNA and genomic 
DNA clones. 

The oligonucleotide primers which we used to amplify human MutL- 
related sequences were 5' - 
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C^^GATTCTAGAGC(T/C)TCNCCNC(T/G)(A/G)AANCC-3 , (SEQ ID NO: 
139) and 5' - AGGTCGGAGCTCAA(A/G)GA(A/G)(T/C)TNGTNGANAA-3' 
(SEQ ID NO: 140). PCR was carried out in 50 jxL reactions containing cDNA 
template, 1.0 /*M each primer, 5 IU of Taq polymerase (C) 50 mM KC1, 10 mM 
Tris buffer pH 7.5 and 1.5 mM MgCl. PCR was carried out for 35 cycles of 1 
minute at 94 C°, 1 minute at 43 C° and 1.5 minutes at 62 C°. Fragments of the 
expected size, approximately 212 bp, were cloned into pUC19 and sequenced. 
The cloned MLH1 PCR products were labeled with a random primer labeling kit 
(RadPrime, Gibco BRL) and used to probe human cDNA and genomic cosmid 
libraries by standard procedures. DNA sequencing of double-stranded plasmid 
DNAs was performed as previously described. 1 

The hMLHl cDNA nucleotide sequence as shown in Figure 3 
encodes an open reading frame of 2268 bp. Also shown in Figure 3 is the 
predicted protein sequence encoded for by the hMLHl cDNA. The underlined 
DNA sequences are the regions of cDNA that correspond to the degenerate PCR 
primers t&at were originally used to amplify a portion of the MLH1 gene 
(nucleotides 118-135 and 343-359). 

Figure 4A shows 19 nucleotide sequences corresponding to portions 
of hMLHl. Each sequence includes one of the 19 exons> in its entirety, 
surrounded by flanking intron sequences. Target PCR primer cites are 
underlined. More details relating to the derivation and uses of the sequences 
shown in Figure 4A, are set forth below. 

As shown in Figure 5, the hMLHl protein is comprised of 756 
amino acids and shares 41% identity with the protein product of the yeast DNA 
mismatch repair gene, MLHL 4 The regions of the hMLHl protein most similar 
to yeast MLH1 correspond to amino acids 11 through 317, showing 55% identity, 
and the last 13 amino acids which are identical between the two proteins. Figure 
5 shows an alignment of the predicted human MLH1 and S. cerevisiae MLH1 
protein sequences. Amino acid identities are indicated by boxes, and gaps are 
indicated by dashes. The pair wise protein sequence alignment was performed 
with DNAStar MegAlign using the clustal method. 27 Pair wise alignment 
parameters were a ktuple of 1, gap penalty of 3, window of 5 and diagonals of 5. 
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Furthermore, as shown in Figure 13, the predicted amino acid sequences of the 
human and mouse MLH1 proteins show at least 74% identity. 

Figure 6 shows a phylogenetic tree of MutL-related proteins. The 
phylogenetic tree was constructed using the predicted amino acid sequences of 7 
MutL-related proteins: human MLH1; mouse MLH1; 5. cerevisiae MLH1; 5. 
cerevisiae PMS1; E. coli; MutL; S. typhimurium MutL and S. pneumoniae HexB. 
Required sequences were obtained from GenBank release 7.3. The phylogenetic 
tree was generated with the PILEUP program of the Genetics Computer Group 
software using a gap penalty of 3 and a length penalty of 0.1. The recorded DNA 
sequences of hMLHl and hPMSl have been submitted to GenBank. 

> 

hMLHl Intron Location and Intron/Exon Boundary Structures 

In our previous U.S. Patent Application No. 08/209,521, we 
described the nucleotide sequence of a complimentary DNA (cDNA) clone of a 
human gene, hMLHl. The cDNA sequence of hMLHl (SEQ ID NO: 4) is 
presented in this application in Figure 3. We note that there may be some 
variability between individuals hMLHl cDNA structures, resulting from 
polymorphisms within the human population, and the degeneracy of the genetic 
code. 

In the present application, we report the results of our genomic 
sequencing studies. Specifically, we have cloned the human genomic region that 
includes the hMLHl gene, with specific focus on individual exons and surrounding 
intron/exon boundary structures. Toward the ultimate goal of designing a 
comprehensive and efficient approach to identify and characterize mutations 
which confer susceptibility to cancer, we believe it is important to know the wild- 
type sequences of intron structures which flank exons in the hMLHl gene. One 
advantage of knowing the sequence of introns near the exon boundaries, is that 
it makes it possible to design primer pairs for selectively amplifying entire 
individual exons. More importantly, it is also possible that a mutation in an 
intron region, which, for example, may cause a mRNA splicing error, could result 
in a defective gene product, i.e,, susceptibility to. cancer, without showing any 
abnormality in an exon region of the gene. We believe a comprehensive 
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screening approach requires searching for mutations, not only in the exon or 
cDNA, hut also in the intron structures which flank the exon boundaries. 

We have cloned the human genomic region that includes hMLHl 
using approaches which are known in the art, and other known approaches could 
5 have been used. We used PCR to screen a PI human genomic library for the 

hMLHl gene. We obtained four clones, two that contained the whole gene and 
two which lacked the C-terminus. We characterized one of the full length clones 
by cycle sequencing, which resulted in our definition of all intron/exon junction 
sequences for both sides of the 19 hMLHl exons. We then designed multiple sets 

10 of PCR primers to amplify each individual exon (first stage primers) and verified 

the sequence of each exon and flanking intron sequence by amplifying several 
different genomic DNA samples and sequencing the resulting fragments using an 
ABI 373 sequencer. In addition, we have determined the sizes of each hMLHl 
exon using PCR methods. Finally, we devised a set of nested PCR primers 

15 (second stage primers) for reamplification of individual exons. We have, used the 

second stage primers in a multi-plex method for analyzing HNPCC families and 
tumors for hMLHl mutations. Generally, in the nested PCR primer approach,, 
we perform a first multi-plex amplification with four to eight sets of "first stage" 
primers, each directed to a different exon. We then reamplify individual exons 

20 from the product Qf the first amplification step, using a single set of second stage 

primers. Examples and further details relating to our use of the first and second 
stage primers are set forth below. 

Through our genomic sequencing studies, we have identified all 
nineteen exons within the hMLHl gene, and have mapped the intron/exon 

25 boundaries. One aspect of the invention, therefore, is the individual exons of the 

hMLHl gene. Table 1 presents the nucleotide coordinates (i.e., the point of 
insertion of each intron within the coding region of the gene) of the hMLHl 
exons (SEQ ID NOS: 25-43). The presented coordinates are based on the 
hMLHl cDNA sequence, assigning position 'T to the "A" of the start "ATG" 

30 (which A is nucleotide 1 in SEQ ID NO: 4. 
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Table 1 



Intron Number 


cDNA Sequence Coordinates 


intron 1 


116 & 117 


intron 2 


207 & 208 


intron 3 


306 & 307 


intron 4 


380 & 381 


intron 5 


453 & 454 


intron 6 


545 & 546 


intron 7 


592 & 593 


intron 8 


677 & 678 


intron 9 


790 & 791 


intron 10 


884 & 885 . 


intron 11 


1038 & 1039 


intron 12 


1409 & 1410 


intron 13 


1558 & 1559 


intron 14 


1667 & 1668 


intron 15 


1731 & 1732 


intron 16 


1896 & 1897 


intron 17 


1989 & 1990 


intron 18 


2103 & 2104 



We have also determined the nucleotide sequence of intron regions 
which flank exons of the hMLHl gene. SEQ ID NOS: 6-24 are individual exon 
sequences bounded by their respective upstream and downstream intron 
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sequences. The same nucleotide structures are shown in Fig. 4A, where the exons 
are numbered from N-terminus to Oterminus with respect to the chromosomal 
locus. The 5-digit numbers indicate the primers used to amplify the exon. All 
sequences are numbered assuming the A of the ATG codon is nucleotide 1* The 
5 numbers in ( ) are the nucleotide coordinates of the coding sequence found in the 

indicated exon. Uppercase is imron. Lowercase is exon or non-translated 
sequences found in the mRNA/cDNA clone. Lowercase and underlined 
sequences correspond to primers. The stop codon at 2269-2271 is in italics and 
underlined. 

10 Table 2 presents the sequences of primer pairs ("first stage" primers) 

which we have used to amplify individual exons together with flanking intron 
structures. 



Table 2 



EXON 
NO. 


PRIMER 
LOCATION 


PRIMER 
NO. 


PRIMER 

SEQID 

NO 


PRIMER NUCLEOTIDE 
SEQUENCE 

* 


1 


upstream 


18442 


44 


5'aggcactgaggtgattggc 


1 


downstream 


19109 


45 


5'tcgtagcccttaagtgagc 


2 


upstream 


19689 


46 


5*aatatgtacattagagtagttg 


2 


downstream 


19688 


47 


5'cagagaaaggtcctgactc 


3 


upstream 


19687 


48 


5'agagatttggaaaatgagtaac 


3 


downstream 


19786 


49 


5'acaatgtcatcacaggagg 


4 


upstream 


18492 


50 


5'aacctttccctttggtgagg 


4 


downstream 


18421 


51 


5'gattactctgagacctaggc 


5 


upstream 


18313 


52 


S'gattttctcttttccccttggg 


5 


downstream 


18179 


53 


5'caaacaaagcttcaacaatttac 
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EXON 
NO. 


PRIMER 
LOCATION 


PRIMER 
NO. 


PRIMER 

SEQID 

NO 


PRIMER NUCLEOTIDE 
SEQUENCE 


6 


upstream 


18318 


54 


5'gggttttattttcaagtacttctatg 


6 


downstream 


18317 


55 


5'gctcagcaactgttcaatgtatgagc 


7 


upstream 


19009 


56 


S'ctaetstgtetttttcec 


7 


downstream 


19135 


57 


5'cataaccttatctccacc 


g 




18197 


58 


S'rtf^ acrataaaacaataaatcc 


ft 

O 




18924 


59 


a at trrr 5i a t a a tat t P ct 


o 




18765 


60 




9 


d own s tre am 


18198 


61 




10 


iifiQtrpnTTi 


18305 


62 




It) 


VJ V» VV Xlo LI VUiil 


18306 


63 


S'an aaaoa a^rtaatflafinrAtcta 


11 


UUj LI Wuill 


18182 


64 


S ' <ro oct t tttC'tccrcrtccr 


11 


downstream 


19041 


65 


5 * aaaat ctgggct c tcacg 


12 


upstream 


18579 


66 


5'aattatacctcatactagc 


12 


downstream 


18178 


67 


5' £ttttattaca&aataaa££as£ 


12 


downstream 


19070 


68 


5'aagccaaagttagaaggca 


13 


upstream 


18420 


69 


5'tgcaacccacaaaatttggc 


13 


downstream 


18443 


70 


5'ctttctccatttccaaaacc 


14 


upstream 


19028 


71 


5'tggtgtctctagttctgg 


14 


downstream 


18897 


72 


5 'cattgttgtagtagctctgc 


15 


upstream 


19025 


73 


5'cccatttgtcccaactgg 
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EXON 
NO. 


PRIMER 

LOCATION 


PRIMER 
NO. 


PRIMER 
SEQ ID 
NO 


PRIMER NUCLEOTIDE 

SEQUENCE 


15 


downstream 


18575 


74 


5'cggtcagttgaaatgtcag 


16 


upstream 


18184 


75 


5'catttggatgctccgttaaagc 


16 


downstream 


18314 


76 


5'cacccggctggaaattttatttg 


17 


upstream 


18429 


77 


5'ggaaaggcactggagaaatggg 


17 


downstream 


18315 


78 


5'ccctccagcacacatgcatgtaccg 


18 


upstream 


18444 


79 


5'taagtagtctgtgatctccg 


18 


downstream 


18581 


80 


5'atgtatgaggtcctgtcc 


19 


upstream 


18638 


81 


5'gacaccagtgtatgttgg 


19 


downstream 


18637 


82 


5'gagaaagaagaacacatccc 



Additionally, we have designed a set of "second stage" amplification 
primers, the structures of which are shown below in Table 3. We use the second 
stage primers in conjunction with the first stage primers in a nested amplification 
protocol, as described below. 

Table 3 



EXON 
NO. 


PRIMER 
LOCATION 


PRIMER 
NO. 


PRIMER 
SEQ ID 
NO 


PRIMER 

NUCLEOTIDE 

SEQUENCE 


1 


upstream 


19295 


83 


5'tgtaaaacgacggccagtcact 
gaggtgattggctgaa 


1 


downstream 


19446 


84 


*5'tagcccttaagtgagcccg 


2 


upstream 


18685 


85 


5' tgtaaaacgacggccagttacat 
tagagtagttgcaga 
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EXON 
NO. 


PRIMER 
LOCATION 


PRIMER 
NO. 


PRIMER 

SEQID 

NO 


PRIMER 

NUCLEOTIDE 

SEQUENCE 


2 


downstream 


19067 


86 


*5'aggtcctgactcttccatg 


3 


upstream 


18687 


87 


5'tgtaaaacgacggccagtttgga 
aaatgagtaacatgatt 


3 


downstream 


19068 


88 


*5*tgtcatcacaggaggatat 


4 


upstream 


19294 


89 

^ 


5'tgtaaaacgacggccagtctttc 
cctttggtgaggtga 


4 


downstream 


19077 


90 


*5'tactctgagacctaggccca 


5 


upstream 


19301 


91 


5'tgtaaaacgacggccagttctct 
tttccccttgggattag 


5 ■ 


downstream 


19046 


92 


*5 , acaaagcttcaacaatttactc 
t 


6 


upstream 


19711 


93 


5'tgtaaaacgacggccagtgtttt 
attttcaagtacttctatgaatt 


6 


downstream 


19079 


94 


*5 , cagcaactgttcaatgtatgag 
cact 


7 


upstream 


19293 


95 


5'tgtaaaacgacggccagtgtgtg 
tgtttttggcaac 


7 


downstream 


19435 


96 


*5'aaccttat<:tccaccagc 


8 


upstream 


19329 


97 


5' tgtaaaacgacggccagtagcc 
atgagacaataaatccttg 


8 


downstream 


19450 


98 


*5'tcccaaataatgtgatggaatg 


9 


upstream 


19608 


99 


5'tgtaaaacgacggccagtaagc 
ttcagaatctctttt 
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EXON 
NO. 


PRIMER 
LOCATION 


PRIMER 
NO. 


PRIMER 

SEQID 

NO 


PRIMER 

NUCLEOTIDE 

SEQUENCE 


9 


downstream 


19449 


100 


•S'tgggtgtttcctgtgagtggatt 


10 


upstream 


19297 


101 


S'tgtaaaacgacggccagtacttt 
gtgtgaatgtacacctgtg 


10 


downstream 


19081 


102 


*5'gagagcctgatagaacatctgt 
tg 


11 


upstream 


19486 


103 


5'tgtaaaacgacggccagtcttttt 
ctccccctcccacta 


11 


downstream 


19455 


104 


* 5'tctgggctctcacgtct 


12 


upstream 


20546 


105 


*5'cttattctgagtctctcc 


12 


downstream 


20002 


106 

* 


5'tgtaaaacgacggccagtgtttg 
. ctcagaggctgc 


12 


upstream - 


19829 


107 


*5'gatggttcgtacagattcccg 


12 


downstream 


19385 


108 


5'tgtaaaacgacggccagtttatt 
acagaataaaggaggtag 


13 


upstream 


19300 


109 


5'tgtaaaacgacggccagtaacc 
cacaaaatttggctaag 


13 


downstream 


19078 


110 


*5'tctccatttccaaaaccttg 


14 


upstream 


19456 


111 


* 5' tgtctctagttctggtgc 


14 


downstream 


19472 


112 


5'tgtaaaacgacggccagttgttg 
tagtagctctgcttg 


15 


upstream 


19697 


113 


*5'atttgtcccaactggttgta 
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EXON 
NO. 


PRIMER 
LOCATION 


PRIMER 
NO. 


PRIMER 

SEQID 

NO 


PRIMER 

NUCLEOTIDE 

SEQUENCE 


15 


downstream 


19466 


114 


5'tgtaaaacgacggccagttcagt 
tgaaatgtcagaaagtg 


16 


upstream 


19269 


115 


5'tgtaaaacgacggccagt 


16 


downstream 


19047 


116 


*5'ccggctggaaattttatttggag 


17 


upstream 


19298 


117 


5'tgtaaaacgacggccagtaggc 
actggagaaatgggatttg 


17 


downstream 


19080 


118 


*5'tccagcacacatgcatgtaccg 
aaat 


18 


upstream 


19436 


119 


*5'gtagtctgtgatctccgttt 


18 


downstream 


19471 


120 


5'tgtaaaacgacggccagttatga 
ggtcctgtcctag 


19 


upstream 


19447 


121 


*5'accagtgtatgttgggatg 


19 


• downstream 


19330 


122 


5'tgtaaaacgacggccagtgaaa 
gaagaacacatcccaca 



In Table 3 an asteric (*) indicates that the 5' nucleotide is 
biotinylated. Exons 1-7, 10, 13 and 16-19 can be specifically amplified in PCR 
reactions containing either L5 mM or 3 mM MgCl 2 . Exons 11 and 14 can only 
be specifically amplified in PCR reactions containing 1.5 mM MgGU and exons 
8, 9, 12 and 15 can only be specifically amplified in PCR reactions containing 3 
mM MgGU. With respect to exon 12, the second stage amplification primers 
have been designed so that exon 12 is reamplified in two halves. The 20546 and 
20002 primer set amplifies the N-terminal half. The primer set 19829 and 19835 
amplifies the C-terminal half. An alternate primer for 18178 is 19070. 
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The hMLHl sequence information provided by our studies and 
disclosed in this application and preceding related applications, may be used to 
design a large number of different oligonucleotide primers for use in identifying 
hMLHl mutations that correlate with cancer susceptibility and/or with tumor 
development in an individual, including primers that will amplify more than one 
exon (and/or flanking intron sequences) in a single product band. 

One of ordinary skill in the art would be familiar with 
considerations important to the design of PCR primers for use to amplify the 
desired fragment or gene. 37 These considerations may be similar, though not 
necessarily identical to those involved in design of sequencing primers, as 
discussed above. Generally it is important that primers hybridize relatively 
specifically (i.e. have a T m of greater than about 55-degrees° Q and preferably 
around 60-degrees° C). For most cases, primers between about 17 and 25 
nucleotides in length work well. Longer primers can be useful for amplifying 
longer fragments. In all cases, it is desirable to avoid using primers that are 
complementary to more than one sequence in the human genome, so that each 
pair of PCR primers amplifies only a single, correct fragment. Nevertheless, it is 
only absolutely necessary that the correct band be distinguishable from other 
product bands in the PCR reaction. 

The exact PCR conditions (e.g. salt concentration, number of cycles, 
type of DNA polymerase, etc.) can be varied as known in the art to improve, for 
example, yield or specificity of the reaction. In particular, we have found it 
valuable to use nested primers in PCR reactions in order to reduce the amount 
of required DNA substrate and to improve amplification specificity. 

Two examples follow. The first example illustrates use of a first 
stage primer pair (SEQ ID NOS: 69 and 70) to amplify intron/exon segment 
(SEQ ID NO: 18). The second example illustrates use of second stage primers 
to amplify a target intron/exon segment from the product of a first PCR 
amplification step employing first stage primers. 

EXAMPLE 1: Amplification of hMLHl genomic clones from a PI 

phage library 
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25ng genomic DNA (or lng of a PI phage can be used) was used 
in PGR reactions including: 
0.05mM dNTPs 
50mM KC1 
5 3mM Mg 

lOmM Tris-HCl pH 8.5 
0.01% gelatin 
5juM primers 

Reactions were performed on a Perkin-Elmer Cetus model 9600 thermal cycler. 
10 Reactions were incubated at 95-degrees° C for 5 minutes, followed by 35 cycles 

(30 cycles from a PI phage) of: 

94-degrees° C for 30 seconds 
55-degrees° C for 30 seconds 
72-degrees° C for 1 minute. 
- 15 A final 7. minute extension reaction was then performed at 72°-degrees C. 

Desirable PI clones were those from which an approximately bp product band 
was produced. 

EXAMPLE 2: Amplification of hMLHl sequences from genomic 
DNA using nested PCR primers 
20 We performed two-step PCR amplification of hMLHl sequences 

from genomic DNA as follows. Typically, the first amplification was performed 
in a 25 microliter reaction including: 

25ng of chromosomal DNA 

Perkin-Elmer PCR buffer II (any suitable buffer could be used) 
25 3mM MgCl 2 

50^M each dNTP 

Taq DNA polymerase 

5^M primers (SEQ ID NOS: 69, 70) 
and incubated at 95-degrees° C for 5 minutes, followed by 20 cycles of: 
30 94-degrees° C for 30 seconds 

55-degrees° C for 30 seconds. 
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The product band was typically small enough (less than an approximately 500 bp) 
that separate extension steps were not performed as part of each cycle. Rather, 
a single extension step was performed, at 72-degrees° C for 7 minutes, after the 
20 cycles were completed. Reaction products were stored at 4-degrees° C. 

The second amplification reaction, usually 25 or 50 microliters in 
volume, included: 

1 or 2 microliters (depending on the volume of the reaction) of the 
first amplification reaction product 

Perkin-Elmer PCR buffer II (any suitable buffer could be used) 

3mM or MgCi 2 

50 pM each dNTP 

Taq DNA polymerase 

SjuM nested primers (SEQ ID NOS: 109, 110), 
and was incubated at 95-degrees° C for 5 minutes, followed by 20-25 cycles of: 

94-degrees° C for 30 seconds 

55-degrees° C for 30 seconds 
a single extension step was performed, at 72-degrees° C for 7 minutes, after the 
cycles were completed. Reaction products were stored at 4-degrees° C. 

Any set of primers capable of amplifying a target hMLHl sequence 
can be used in the first amplification reaction. We have used each of the primer 
sets presented in Table 2 to amplify an individual hMLHl exon in the first 
amplification reaction. We have also used combinations of those primer sets, 
thereby amplifying multiple individual hMLHl exons in the first amplification 
reaction. 

The nested primers used in the first amplification step were 
designed relative to the primers used in the first amplification reaction. That is, 
where a single set of primers is used in the first amplification reaction, the 
primers used in the second amplification reaction should be identical to the 
primers used in the first reaction except that the primers used in the second 
reaction should not include the 5'-most nucleotides of the first amplification 
reaction primers, and should extend sufficiently more at the 3' end that the T m of 
the second amplification primers is approximately the same as the T ro of the first 



WO 95/16793 



PCT/US94/I4746 



36 

amplification reaction primers. Our second reaction primers typically lacked the 
3 5'-most nucleotides of the first amplification reaction primers, and extended 
approximately 3-6 nucleotides farther on the 3' end, SEQ ID NOS: 109, 110 are 
examples of nested primer pairs that could be used in a second amplification 
5 reaction when SEQ ID NOS: 69 and 70 were used in the first amplification 

reaction. 

We have also found that it can be valuable to include a standard 
sequence at the 5* end of one of the second amplification reaction primers to 
prime sequencing reactions* Additionally, we have found it useful to biotinylate 
10 that last nucleotide of one or both of the second amplification reaction primers 

so that the product band can easily be f purified using magnetic beads 40 and then 
sequencing reactions can be performed directly on the bead-associated 
products. 41 * 45 

For additional discussion of multiplex amplification and sequencing 
15 methods, see References by Zu et al. and Espelund et aL 46, 47 

hMLHl Link to Cancer 

As a first step to determine whether hMLHl was a candidate for 
the HNPCC locus on huma.i chromosome 3p21-23, 3 we mapped hMLHl by 
20 fluorescence in situ hybrid; nation (FISH). 20,21 We used two separate genomic 

fragments (data not shown) of the hMLHl gene in FISH analysis. Examination 
of several metaphase chromosome spreads localized hMLHl to chromosome 
3p21.3-23. 

Panel A of Figure 7 shows hybridization of hMLHl probes in a 
25 metaphase spread, Biotinylated hMLHl genomic probes were hybridized to 

banded human metaphase chromosomes as previously described. 20,21 Detection 
was performed with fluorescein isothiocyanate (FITC)-conjugated avidin (green 
signal); chromosomes, shown in blue, were counterstained with 4'6-diamino-2- 
phenylindole (DAPI). Images were obtained with a cooled CCD camera, 
30 enhanced, pseudocoloured and merged with the following programs: CCD Image 

Capture; NIH Image 1.4; Adobe Photoshop and Genejoin Maxpix respectively. 
Panel B of Figure 7 shows a composite of chromosome 3 from multiple 
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metaphase spreads aligned with the human chromosome 3 ideogram. Region of 
hybridization (distal portion of 3p21.3-23) is indicated in the ideogram by a 
vertical bar. 

As independent confirmation of the location of hMLHl on 
5 chromosome 3, we used both PCR with a pair of hMLHl -specific oligonucleotides 

and Southern blotting with a hMLHl -specific probe to analyze DNA from the 
NIGMS2 rodent/human cell panel (Coriell Inst, for Med. Res., Camden, NJ, 
USA). Results of both techniques indicated chromosome 3 linkage. We also 
mapped the mouse MLH1 gene by FISH to chromosome 9 band E. This is a 

10 position of synteny to human chromosome 3p. 22 Therefore, the hMLHl gene 

localizes to 3p21.3-23, within the genomic region implicated in chromosome 3- 
linked HNPCC families. 3 

Next, we analyzed blood samples from affected and unaffected 
individuals from two chromosome-3 candidate HNPCC families 3 for mutations. 

15 One family, Family 1, showed significant linkage (lod score « 3,01 at 

recombination fraction of 0) between HNPCC and a marker' on 3p. For the 
- second family, Family 2, the reported lod score. (1.02) was below the commonly 
• * accepted level of significance, and thus only suggested linkage to the same marker 

on 3p. Subsequent linkage analysis of Family 2 with the microsatellite marker 

20 D3S1298 on 3p21.3 gave a more significant lod score of 1.88 at a recombination 

fraction of 0. Initially, we screened for mutations in two PCR-amplified exons of 
the hMLHl gene by direct DNA sequencing (Figure 4). We examined these two 
exons from three affected individuals of Family 1, and did not detect any 
differences from the expected sequence. In Family 2, we observed that four 

25 individuals affected with colon cancer are heterozygous for a C to T substitution 

in an exon encoding amino acids 41-69, which corresponds to a highly-conserved 
region of the protein (Figure 9). For one affected individual, we screened PCR- 
amplified cDNA for additional sequence differences. The combined sequence 
information obtained from the two exons and cDNA of this one affected 

30 individual represents 95% (i.e. all but the first 116 bp) of the open reading frame. 

We observed no nucleotide changes other than the C to T substitution. In 
addition, four individuals from Family 2, predicted to be carriers based upon 
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linkage data, and as yet unaffected with colon cancer, were found to be 
heterozygous for the same C to T substitution. Two of these predicted carriers 
are below and two are above the mean age of onset (50 years) in this particular 
family. Two unaffected individuals examined from this same family, both 
5 predicted by linkage data to be non carriers, showed the expected normal 

sequence at this position. Linkage analysis that includes the C to T substitution 
in Family 2 gives a lod score of 2.23 at a recombination fraction 0. Using low 
stringency cancer diagnostic criteria, we calculated a lod score of 2.53. These 
data indicate the C to T substitution shows significant linkage to the HNPCC in 
10 Family 2. 

Figure 8 shows sequence chromatograms indicating a C to T 
transition mutation that produces a non-conservative amino acid substitution at 
position 44 of the hMLHl protein. Sequence analysis of one unaffected (top 
panels, plus and minus strands) and one affected individual (lower panels, plus 

15 and minus Strands) is presented. The position of the heterozygous nucleotide is 

indicated by an arrow. Analysis of the sequence chromatographs indicates that 
there is sufficient T signal in the C peak and enough A signal in the G peak for 
the affected individuals to be heterozygous at this site. 

To determine whether this C to T substitution was a polymorphism, 

20 we sequenced this same exon amplified from the genomic DNA from 48 

unrelated individuals and observed only the normal sequence. We have examined 
an additional 26 unrelated individuals using allele specific oligonucleotide (ASO) 
hybridization analysis. 33 The ASO sequences (SEQ ID NOS: 141 and 142, 
respectively) which we used are: 

25 S'-ACTTGTGGATTTTGC-S* and 

5'-ACTTGTGAATTTTGC-3\ 

Based upon direct DNA sequencing and ASO analysis, none of these 74 unrelated 
individuals carry the C to T substitution. Therefore, the C to T substitution 
observed in Family 2 individuals is not likely to be a polymorphism. As 
30 mentioned above, we did not detect this same C to T substitution in affected 

individuals from a second chromosome 3-linked family, Family I. 3 We are 
continuing to study individuals of Family 1 for mutations in hMLHl. 
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Table 4 below summarizes our experimental analysis of blood 
samples from affected and unaffected individuals from Family 2 and unrelated 
individuals. 

Table 4 





Number of Individuals with 

C to T Mutation/ 
Number of Individuals Tested 


F 


Status 


A 
M 
I 

L 


Affected 


4/4 


Predicted Carriers 


4/4 


Y 






2 


Predicted Non-carriers 


0/2 




Unrelated Individuals 


0/74 



Based on several criteria, we suggest that the observed C to T 
substitution in the coding region of hMLHl represents the mutation that is the 
basis for HNPCC in Family 2? First, DNA sequence and ASO analysis did not 

20 detect the C to T substitution in 74 unrelated individuals. Thus, the C to T 

substitution is not simply a polymorphism. Second, the observed C to T 
substitution is expected to produce a serine to phenylalanine change at position 
44 (See Figure 9). This amino acid substitution is a non-conservative change in 
a conserved region of the protein (Figures 3 and 9). Secondary structure 

25 predictions using Chou-Fasman parameters suggest a helix-turn-beta sheet 

structure with position 44 located in the turn. The observed Ser to Phe 
substitution, at position 44 lowers the prediction for this turn considerably, 
suggesting that the predicted amino acid substitution alters the conformation of 
the hMLHl protein. The suggestion that the Ser to Phe substitution is a mutation 

30 which confers cancer susceptibility is further supported by our experiments which 
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show that an analogous substitution (alanine to phenylalanine) in a yeast MLH1 
gene results in a nonfunctional mismatch repair protein. In bacteria and yeast, 
a mutation affecting DNA mismatch repair causes comparable increases in the 
rate of spontaneous mutation including additions and deletions within 
dinucleotide repeats. 4 * 541,13 ' 14 ' 15 ' 16 In humans, mutation of hMSH2 is the basis of 
chromosome-2 HNPCC, 1 * 2 tumors which show microsatellite instability and an 
apparent defect in mismatch repair. 12 Chromosome 3-linked HNPCC is also 
associated with instability of dinucleotide repeats. 3 Combined with these 
observations, the high degree of conservation between the human MLH1 protein 
and the yeast DNA mismatch repair protein MLH1 suggests that hMLHl is likely 
to function in DNA mismatch repair. During isolation of the hMLHl gene, we 
identified the hPMSl gene. This observation suggests that mammalian DNA 
mismatch repair, like that in yeast, 4 may require at least two MutL-like proteins. 

It should be noted that it appears that different HNPCC families 
show different mutations in the MLH1 gene. As explained above, affected 
individuals in Family 1 showed "tight linkage" between HNPCC and a locus in the 
region of 3p21-23. However, affected individuals in Family 1 do not have the C 
to T mutation found in Family 2. It appears that the affected individuals in 
Family 1 have a different mutation in their MLH1 gene. Further, we have used 
the structure information and methods described in this application to find and 
characterize another hMLHl mutation which apparently confers cancer 
susceptibility in heterozygous carriers of the mutant gene in a large English 
HNPCC family. The hMLHl mutation in the English family is a + 1 T frameshift 
which is predicted to lead to the synthesis of a truncated hMLHl protein. Unlike, 
for example, sickle cell anemia, in which essentially all known affected individuals 
have the same mutation multiple hMLHl mutations have been discovered and 
linked to cancer. Therefore, knowledge of the entire cDNA sequence for hMLHl 
(and probably hPMSl\ as well as genomic sequences particularly those that 
surround exons, will be useful and important for characterizing mutations in 
families identified as exhibiting a high frequency of cancer. 

Subsequent to our discovery of a cancer conferring mutation in 
hMLHl, studies by others have resulted in the characterization of at least 5 
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additional mutations in hMLHl, each of which appears to have conferred cancer 
susceptibility to individuals in at least one HNPCC family. For example, 
Papadopoulos et aL indentified such as a mutation, characterized by an in-frame 
deletion of 165 base pairs between codons 578 to 632. In another family, 
5 Papadopoulos et al. observed an hMLHl mutation, characterized by a frame shift 

and substitution of new amino acids, namely, a 4 base pair deletion between 
codons 727 and 728, Papadopoulos et al. also reports an hMLHl cancer linked 
mutation, characterized by an extension of the COOH terminus, namely, a 4 base 
pair insertion between codons 755 and 7S6. 38 

10 In summary, we have shown that DNA mismatch repair gene 

hMLHl which is likely to be the hereditary nonpolyposis colon cancer gene 
previously localized by linkage analysis to chromosome 3p21-23. 3 Availability of 
the hMLHl gene sequence will facilitate the screening of HNPCC families for 
cancer-linked mutations. In addition, although loss of heterozygosity (LOH) of 

15 linked markers is not a feature of either the 2p or 3p forms of HNPCC, 3,6 LOH 

involving the 3p2 1.3-23 region has been observed in several human cancers. 24 " 26 
This suggests the possibility that hMLHl mutation may play some role in these 
tumors. 

20 Human PMS1 

Human PMS1 was isolated using the procedures discussed with 
reference to Figure 1. Figure 10 shows the entire hPMSl cDNA nucleotide 
sequence. Figure 11 shows an alignment of the predicted human and yeast PMS1 
protein sequences. We determined by FISH analysis that human PMS1 is located 
25 on chromosome 7. Subsequent to our discovery of hPMSl, others have identified 

mutations in the gene which appear to confer HNPCC susceptibility. 39 

Mouse MLH1 

Using the procedure outlined above with reference to Figure 1, we 
30 have determined a partial nucleotide sequence of mouse MLH1 cDNA, as shown 

in Figure 12 (SEQ ID NO: 135). Figure 13 shows the corresponding predicted 
amino acid sequence for mMLHl protein (SEQ ID NO: 136) in comparison to 
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the predicted hMLHl protein sequence (SEQ ID NO: 5). Comparison of the 
mouse and human MLH1 proteins as well as the comparison of hMLHl with 
yeast MLH1 proteins, as shown in Figure 9, indicate a high degree of 
conservation. 

Mouse PMS1 

Using the procedures discussed above with reference to Figure 1, 
we isolated and sequenced the mouse PMS1 gene, as shown in Figure 14 (SEQ 
ID NO: 137). This cDNA sequence encodes a predicted protein of 864 amino 
acids (SEQ ID NO: 138), as shown in Figure 15, where it is compared to the 
predicted amino acid sequence for hPMSl (SEQ ID NO: 133). The degree of 
identity between the predicted mouse and human PMS1 proteins is high, as would 
be expected between two mammals. Similarly, as noted above, there is a strong 
similarity between the human PMS1 protein and the yeast DNA mismatch repair 
protein PMS1, as shown in Figure 11. The fact that yeast PMS1 and MLH1 
function in yeast to repair DNA mismatches, strongly suggests that human and 
mice PMS1 and MLH1 are also mismatch repair proteins. 

Uses for Mouse MLH1 and PMS1 

• We believe our isolation and characterization of mMLHl and 
mPMSl genes will have many research applications. For example, as already 
discussed above, we have used our knowledge of the mPMSl gene to produce 
antibodies which react specifically with hPMSl. We have already explained that 
antibodies directed to the human proteins, MLH1 or PMS1 may be used for both 
research purposes as well as diagnostic purposes. 

We also believe that our knowledge oimPMSl and mMLHl will be 
useful for constructing mouse models in order to study the consequences of DNA 
mismatch repair defects. We expect that mPMSl or mMLHl defective mice will 
be highly prone to cancer because chromosome 2p and 3p~associated HNPCC are 
each due to a defect in a mismatch repair gene. 1,2 As noted above, we have 
already produced chimeric mice which carry an mPMSl defective gene. We are 
currently constructing mice heterozygous for mPMSl or mMLHl mutation. These 
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heterozygous mice should provide useful animal models for studying human 
cancer, in particular HNPCC The mice will be useful for analysis of both 
intrinsic and extrinsic factors that determine cancer risk and progression. Also, 
cancers associated with mismatch repair deficiency may respond differently to 
5 conventional therapy in comparison to other cancers. Such animal models will 

be useful for determining if differences exist* and allow the development of 
regimes for the effective treatment of these types of tumors. Such animal models 
may also be used to study the relationship between hereditary versus dietary 
factors in carcinogenesis. 

10 

Distinguishing Mutations From Polymorphisms 

For studies of cancer susceptibility and for tumor identification and 
characterization, it is important to distinguish "mutations" from "polymorphisms". 
A "mutation" produces a "non-wild-type allele" of a gene. A non-wild-type allele 

15 of a gene produces a transcript and/or a protein product that does not function 

normally within a cell. "Mutations" can be any alteration in nucleotide sequence 
including insertions, deletions, substitutions, and rearrangements. 

"Polymorphisms", on the other hand, are sequence differences that 
are found within the population of normally-functioning (i.e., "wild-type") genes. 

20 Some polymorphisms result from the degeneracy of the nucleic acid code. That 

is, given that most amino acids are encoded by more than one triplet codon, many 
different nucleotide sequences can encode the same polypeptide. Other 
polymorphisms are simply sequence differences that do not have a significant 
effect on the function of the gene or encoded polypeptide. For example, 

25 polypeptides can often tolerate small insertions or deletions, or "conservative 11 

substitutions in their amino acid sequence without significantly altering function 
of the polypeptide. 

"Conservative" substitutions are those in which a particular amino 
acid is substituted by another amino acid of similar chemical characteristics. For 

30 example, the amino acids are often characterized as "non-polar (hydrophobic)" 

including alanine, leucine, isoleucine, valine, proline, phenylaline, tryptophan, and 
methionine; "polar neutral", including glycine, serine, threonine, cysteine, tyrosine, 
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asparagine, and glutamine; "positively charged (basic)", including arginine, lysine, 
and histidine; and "negatively charged (acidic)", including aspartic acid and 
glutamic acid. A substitution of one amino acid for another amino acid in the 
same group is generally considered to be "conservative", particularly if the side 
5 groups of the two relevant amino acids are of a similar size. 

The first step in identifying a mutation or polymorphism in a 
mismatch repair gene sequence involves identification, using available techniques 
including those described herein, of a mismatch repair gene, (or gene fragment) 
sequence that differs from a known, normal (e.g. wild-type) sequence of the same 

10 mismatch repair gene (or gene fragment). For example, a hMLHl gene (or gene 

fragment) sequence could be identified that differs in at least one nucleotide 
position from a known normal (e.g. wild-type) hMLHl sequence such as any of 
SEQIDNOS: 6-24. 

Mutations can be distinguished from polymorphisms using any of a 

15 variety of methods, perhaps the most direct of which is data collection and 

correlation with tumor development. That is, for example, a subject might be 
identified whose hMLHl gene sequence differs from a sequence reported in SEQ 
ID. NOS: 6-24, but who does not have cancer and has no family history of 
cancer. Particularly if other, preferably senior, members of that subject's family 

20 have hMLHl gene sequences that differ from SEQ ID NOS: 6-24 in the same 

way(s), it is likely that subject's hMLHl gene sequence could be categorized as 
a "polymorphism". If other, unrelated individuals are identified with the same 
hMLHl gene sequence and no family history of cancer, the categorization may 
be confirmed. 

25 Mutations that are responsible for conferring genetic susceptibility 

to cancer can be identified because, among other things, such mutations are likely 
to be present in all tissues of an affected individual and in the germ line of at 
least one of that individual's parents, and are not likely to be found in unrelated 
families with no history of cancer. 

30 When distinguishing mutations from polymorphisms, it can 

sometimes be valuable to evaluate a particular sequence difference in the 
presence of at least one known mismatch repair gene mutation. In some 
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instances, a particular sequence change will not have a detectable effect (i.e., will 
appear to be a polymorphism) when assayed alone, but will, for example, increase 
the penetrance of a known mutation, such that individuals carrying both the 
apparent polymorphism difference and a known mutation have higher probability 
5 of developing cancer than do individuals carrying only the mutation. Sequence 

differences that have such an effect are properly considered to be mutations, 
albeit weak ones. 

As discussed above and previously (U.S. Patent Application Nos. 
08/168,877 and 08/209,521), mutations in mismatch repair genes or gene products 

10 produced non-wild-type versions of those genes or gene products. Some 

mutations can therefore be distinguished from polymorphisms by their functional 
characteristics in in vivo or in vitro mismatch repair assays. Any available 
mismatch repair assay can be used to analyze these characteristics. 49 * 63 It is 
generally desirable to utilize more than one mismatch repair assay before 

15 classifying" a sequence change as a polymorphism, since some mutations will have 

effects that will not be observed in all assays. 

For example, a mismatch repair gene containing a mutation would 
not be expected to be able to replace an endogenous copy of the same gene in 
a host cell without detectably affecting mismatch repair in that cell; whereas a 

20 mismatch repair gene containing a sequence polymorphism would be expected to 

be able to replace an endogenous copy of the same gene in a host cell without 
detectably affecting mismatch repair in that cell. We note that for such 
"replacement" studies, it is generally desirable to introduce the gene to be tested 
into a host ceil of the same (or at least closely related) species as the cell from 

25 which the test gene was derived, to avoid complications due to, for example, the 

inability of a gene product from one species to interact with other mismatch 
repair gene products from another species. Similarly, a mutant mismatch repair 
protein would not be expected to function normally in an in vitro mismatch repair 
system (preferably from a related organism); whereas a polymorphic mismatch 

30 repair protein would be expected to function normally. 

The methods described herein and previously allow identification 
of different kinds of mismatch repair gene mutations. The following examples 
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illustrate protocols for distinguishing mutations from polymorphisms in DNA 
mismatch repair genes. 

EXAMPLE 3: We have developed a system for testing in yeast, & 
cerevisiae the functional significance of mutations found in either the hMLHl or 
5 hPMSl genes. The system is described in this application using as an example, 

the serine (SER) to phenylalanine (PHE) causing mutation in hMLHl that we 
found in a family with HNPCC, as described above. We have derived a yeast 
strain that it is essentially deleted for its MLH1 gene and hence is a strong 
mutator (i.e., 1000 fold above the normal rate in a simple genetic marker assay 

10 involving reversion from growth dependence on a given amino acid to 

independence (reversion of the hom3-10 allele, Prolla, Christie and Liskay, Mol 
Cell Biol, 14:407-415, 1994). When we placed the normal yeast MLH1 gene 
(complete with all known control regions) on a yeast plasma that is stably 
- maintained as a single copy into the MLHl-deleted strain, the mutator phenotype 

15 is fully corrected using the reversion to amino acid independence assay. However, 

if we introduce a deleted copy of the yeast MLH1 there is no correction. We next 
tested the mutation that in the HNPCC family caused a SER to PHE alteration. 
We found that the resultant mutant yeast protein cannot correct the mutator 
^ phenotype, strongly suggesting that the alteration from the wild-type gene 

20 sequence probably confers cancer susceptibility, and is therefore classified as a 

mutation, not a polymorphism. We subsequently tested proteins engineered to 
contain other amino acids at the "serene" position and found that most changes 
result in a fully mutant, or at least partially mutant phenotype. 

As other "point 1 * mutations in MLH1 and PMS1 genes are found in 

25 cancer families, they can be engineered into the appropriate yeast homolog gene 

and their consequence on protein function studied. In addition, we have 
identified a number of highly conserved amino acids in both the MLH1 and PMS1 
genes. We also have evidence that hMLHl interacts with yeast PMSL This 
finding raises the possibility that mutations observed in the hMLHl gene can be 

30 more directly tested in the yeast system. We plan to systematically make 

mutations that will alter the amino acid at these conserved positions and 
determine what amino acid substitutions are tolerated and which are not. By 
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collecting mutation information relating to hMLHl and hPMSl, both by 
determining and documenting actual found mutations in HNPCC families, and by 
artificially synthesizing mutants for testing in experimental systems, it may be 
eventually possible to practice a cancer susceptibility testing protocol which, once 
5 the individuals hMLHl or hPMSl structure is determined, only requires 

comparison of that structure to known mutation versus polymorphism data. 

EXAMPLE 4: Another method which we have employed to study 
physical interactions between hMLHl and hPMSl, can also be used to study 
whether a particular alteration in a gene product results in a change in the degree 

10 of protein-protein interaction. Information concerning changes in protein-protein 

interaction may demonstrate or confirm whether a particular genomic variation 
is a mutation or a polymorphism. Following our labs findings on the interaction 
between yeast MLH1 and PMS1 proteins in vitro and in vivo, (U.S. Patent 
Application Serial No. 08/168,877),. the interaction between the human 

15 counterparts of these two DNA mismatch repair proteins was tested. The human 

MLH1 and human PMS1 proteins were tested for in vitro interaction using 
maltose binding protein (MBP) affinity chromatography. hMLHl protein was 
prepared as an MBP fusion protein, immobilized on an amylose resin column via 
the MBP, and tested for binding to hPMSl, synthesized in vkro. The hPMSl 

20 protein bound to the MBP-hMLHl matrix, whereas control proteins showed no 

affinity for the matrix. When the hMLHl protein, translated in vitro, was passed 
over an MBP-hPMSl fusion protein matrix, the hMLHl protein bound to the 
MBP-hPMSl matrix, whereas control proteins did not. 

Potential in vivo interactions between hMLHl and hPMSl were 

25 tested using the yeast "two hybrid" system. 28 Our initial results indicate that 

hMLHl and hPMSl interact in vivo in yeast. The same system can also be used 
to detect changes in protein-protein interaction which result from changes in gene 
or gene product structure and which have yet to be classified as either a 
polymorphism or a mutation which confers cancer susceptibility. 

30 



WO 95/16793 



PCT/US94/14746 



48 

Detection of HNPCC Families and Their Mutation(s) 

It has been estimated that approximately 1,000,000 individuals in the 
United States carry (are heterozygous for) an HNPCC mutant gene. 29 
Furthermore, estimates suggest that 50-60% of HNPCC families segregate 
mutations in the MSH2 gene that resides on chromosome 2p. 1,2 Another 
significant fraction appear to be associated with the HNPCC gene that maps to 
chromosome 3p21-22, presumably due to hMLHl mutations such as the C to T 
transition discussed above. Identification of families that segregate mutant alleles 
of either the hMSH2 or hMLHl gene, and the determination of which individuals 
in these families actually have the mutation will be of great utility in the early 
intervention into the disease. Such early intervention will likely include early 
detection through screening and aggressive follow-up treatment of affected 
individuals. In addition, determination of the genetic basis for both familial and 
sporadic tumors could direct the method of therapy in the primary tumor, or in 
recurrences. 

Initially, HNPCC candidate families will be diagnosed partly through 
the study of family histories, most likely at the local level, e.g., by hospital 
oncologists. One criterion for HNPCC is the observation of microsatellite 
instability in individual's tumo/3. 3,6 The presenting patient would be tested for 
mutations -in hMSH2 y hM~ .11, hPMSl and other genes involved in DNA 
mismatch repair as they are identified. This is most easily done by sampling 
blood from the individual. Also highly useful would be freshly frozen tumor 
tissue. It is important to note for the screening procedure, that affected 
individuals are heterozygous for the offending mutation in their normal tissues. 

The available tissues, e.g., blood and tumor, are worked up for 
PCR-based mutation analysis using one or both of the following procedures: 

1) Linkage analysis with a microsatellite marker tightly linked 
to the hMLHl gene. 

One approach to identify cancer prone families with a hMLHl 
mutation is to perform linkage analysis with a highly polymorphic marker located 
within or tightly linked to hMLHl. Microsatellites are highly polymorphic and 
therefore are very useful as markers in linkage analysis. Because we possess the 
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hMLHl gene on a single large genomic fragment in a PI phage clone (~100kbp), 
it is very likely that one or more microsatellites, e.g., tracts of dinucleotide 
repeats, exist within, or very close to, the hMLHl gene. At least one such 
microsatellite has been reported. 38 Once such markers have been identified, PCR 
5 primers will be designed to amplify the stretches of DNA containing the 

microsatellites. DNA of affected and unaffected individuals from a family with 
a high frequency of cancer will be screened to determine the segregation of the 
MLH1 markers and the presence of cancer. The resulting data can be used to 
calculate a lod score and hence determine the likelihood of linkage between 

10 hMLHl and the occurrence of cancer. Once linkage is established in a given 

family, the same polymorphic marker can be used to test other members of the 
kindred for the likelihood of their carrying the hMLHl mutation. 

2) Sequencing of reverse transcribed cDNA. 

a) RNA from affected individuals, unaffected and unrelated 

15 . individuals is reverse transcribed (RTd), followed by PCR to amplify the cDNA 
in 4-5 overlapping portions. 34 * 37 It should be noted that for the purposes of PCR, 
many different oligonucleotide primer pair sequences may potentially be used to 
amplify relevant portions of an individual's hMLHl or hPMSl gene for genetic 
screening purposes. With the knowledge of the cDNA structures for the genes, 

20 it is a straight-forward exercise to construct primer pairs which are likely to be 

effective for specifically amplifying selected portions of the gene. While primer 
sequences are typically between 20 to 30 bases long, it may be possible to use 
shorter primers, potentially as small as approximately 13 bases, to amplify 
specifically selected gene segments. The principal limitation on how small a 

25 primer sequence may be is that it must be long enough to hybridize specifically 

to the targeted gene segment. Specificity of PCR is generally improved by 
lengthening primers and/or employing nested pairs of primers. 

The PCR products, in total representing the entire cDNA, are then 
sequenced and compared to known wild-type sequences. In most cases a 

30 mutation will be observed in the affected individual. Ideally, the nature of 

mutation will indicate that it is likely to inactivate the gene product. Otherwise, 
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the possibility that the alteration is not simply a polymorphism must be 
determined. 

b) Certain mutations, e.g., those affecting splicing or resulting 
in translation stop codons, can destabilize the messenger RNA produced from the 
mutant gene and hence comprise the normal RT-based mutation detection 
method. One recently reported technique can circumvent this problem by testing 
whether the mutant cDNA can direct the synthesis of normal length protein in a 
coupled in vitro transcription/translation system. 32 

3) Direct sequencing of genomic DNA 

A second route to detect mutations relies on examining the exons 
and the intron/exon boundaries by PCR cycle sequencing directly off a DNA 
template. 1,2 This method requires the use of oligonucleotide pairs, such as those 
described in Tables 2 and 3 above, that amplify individual exons for direct PCR 
cycle sequencing. The method depends upon genomic DNA sequence 
information at each intron/exon boundary (50bp, or greater, for each boundary). 
The advantage of the technique is two fold. First, because DNA is more stable 
than RNA, the condition of the material used for PCR is not as important as it 
is for RNA-based protocols. Second, most any mutation within the actual 
transcribed region of the gene, including those in an imron affecting splicing, will 
be detectable. 

For each candidate gene, mutation detection may require knowledge 
of both the entire cDNA structure, and all intron/exon boundaries of the genomic 
structure. With such information, the type of causal mutation in a particular 
family can be determined. In turn, a more specific and efficient mutation 
detection scheme can be adapted for the particular family. Screening for the 
disease (HNPCC) is complex because it has a genetically heterogeneous basis in 
the sense that more than one gene is involved, and for each gene, multiple types 
of mutations are involved. 2 Any given family is highly likely to segregate one 
particular mutation. However, as the nature of the mutation in multiple families 
is determined, the spectrum of the most prevalent mutations in the population 
will be determined. In general, determination of the most frequent mutations will 
direct and streamline mutation detection. 
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Because HNPCC is so prevalent in the human population, carrier 
detection at birth could become part of standardized neonatal testing. Families 
at risk can be identified and all members not previously tested can be tested. 
Eventually, all affected kindreds could be determined. 

Mode of Mutation Screening and Testing 
DNA-based Testing 

Initial testing, including identifying likely HNPCC families by 
standard diagnosis and family history study, will likely be done in local and 
smaller DNA diagnosis laboratories. However, large scale testing of multiple 
family members, and certainly population wide testing, will ultimately require 
large efficient centralized commercial facilities. 

Tests will be developed based on the determination of the most 
common mutations for the major genes underlying HNPCC, including at least the 
hMSHl gene on chromosome 2p and the MLH1 gene on chromosome 3p. A 
variety of tests are likely to be developed. For example, one possibility is a set 
of tests employing oligonucleotide hybridizations that distinguish the normal vs. 
mutant alleles. 33 As already noted, our knowledge of the nucleotide structures for 
hMLHl, hPMSl and hMSH2 genes makes possible the design of numerous 
oligonucleotide primer pairs which may be used to amplify specific portions of an 
individual's mismatch repair gene for genetic screening and cancer risk analysis. 
Our knowledge of the genes' structures also makes possible the design of labeled 
probes which can be quickly used to determine the presence or absence of all or 
a portion of one of the DNA mismatch repair genes. For example, allele-specific 
oligomer probes (ASO) may be designed to distinguish between alleles, ASOs are 
short DNA segments that are identical in sequence except for a single base 
difference that reflects the difference between normal and mutant alleles. Under 
the appropriate DNA hybridization conditions, these probes can recognize a 
single base difference between two otherwise identical DNA sequences. Probes 
can be labeled radioactively or with a variety of non-radioactive reporter 
molecules, for example, fluorescent or chemiluminescent moieties. Labeled 
probes are then used to analyze the PCR sample for the presence of the disease- 
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causing allele. The presence or absence of several different disease-causing genes 
can readily be determined in a single sample. The length of the probe must be 
long enough to avoid non-specific binding to nucleotide sequences other than the 
target. All tests will depend ultimately on accurate and complete structural 
information relating to hMLHl, hMSH2, hPMSl and other DNA mismatch repair 
genes implicated in HNPCG 

Protein Detection-Based Screening 

Tests based on the functionality of the protein product, perse, may 
also be used. The protein-examining tests will most likely utilize antibody 
reagents specific to either the hMLHl, hPMSl and hMSH2 proteins or other 
related "cancer" gene products as they are identified. 

For example, a frozen tumor specimen can be cross-sectioned and 
prepared for antibody staining using indirect fluorescence techniques. Certain 
gene mutations are expected to alter or destabilize the protein structure 
sufficiently such as to give an altered or reduced signal after antibody staining. 
It is likely that such tests will be performed in cases where gene involvement in 
a family's cancer has yet to be established. We are in the process of developing 
diagnostic monoclonal antibodies against the human MLH1 and PMS1 proteins. 
We are overexpressing MLH1 and PMS1 human proteins in bacteria. Ws will 
purify the proteins, inject them into mice and derive protein specific mo. .oclonal 
antibodies which can be used for diagnostic and research purposes. 

Identification and Characterization of DNA Mismatch Repair Tumors 

In addition to their usefulness in diagnosing cancer susceptibility in 
a subject, nucleotide sequences that are homologous to a bacterial mismatch 
repair gene can be valuable for, among other things, use in the identification and 
characterization of mismatch-repair-defective tumors. Such identification and 
characterization is valuable because mismatch-repair-defective tumors may 
respond better to particular therapy regimens. For example, mismatch-repair- 
defective tumors might be sensitive to DNA damaging agents, especially when 
administered in combination with other therapeutic agents. 
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Defects in mismatch repair genes need not be present throughout 
an individual's tissues to contribute to tumor formation in that individual* 
Spontaneous mutation of a mismatch repair gene in a particular cell of tissue can 
contribute to tumor formation in that tissue. In fact, at least in some cases, a 
single mutation in a mismatch repair gene is not sufficient for tumor development 
In such instances, an individual with a single mutation in a mismatch repair gene 
is susceptible to cancer, but will not develop a tumor until a secondary mutation 
occurs. Additionally, in some instances, the same mismatch repair gene mutation 
that is strictly tumor-associated in an individual will be responsible for conferring 
cancer susceptibility in a family with a hereditary predisposition to cancer 
development. > 

In yet another aspect of the invention, the sequence information we 
have provided can be used with methods known in the art to analyze tumors (or 
tumor cell lines) and to identify tumor-associated mutations in mismatch repair 
genes. Preferably, it is possible to demonstrate that these tumor-associated 
mutations are not present in non-tumor tissues from the same individual. The 
information described in this application is particularly useful for the 
identification of mismatch repair gene mutations within tumors (or tumor cell 
lines) that display genomic instability of short repeated DNA elements. 

The^ sequence information and testing protocols of the present 
invention can also be used to determine whether two tumors are related, i.e., 
whether a second tumor is the result of metastasis from an earlier found first 
tumor which exhibits a particular DNA mismatch repair gene mutation. 

Isolating Additional Genes of Related Function 

Proteins that interact physically with either hMLHl and/or hPMSl, 
are likely to be involved in DNA mismatch repair. By analogy to hMLHl and 
HMSH2, mutations in the genes which encode for such proteins would be strong 
candidates for potential cancer linkage. A powerful molecular genetic approach 
using yeast, referred to as a "two-hybrid system", allows the relatively rapid 
detection and isolation of genes encoding proteins that interact with a gene 
product of interest, e.g., hMLHl. 28 
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The two-hybrid system involves two plasmid vectors each intended 
to encode a fusion protein. Each of the two vectors contains a portion, or 
domain, of a transcription activator. The yeast cell used in the detection scheme 
contains a "reporter" gene. The activator alone cannot activate transcription. 
However, if the two domains are brought into close proximity then transcription 
may occur. The cDNA for the protein of interest, e.g., hMLHl is inserted within 
a reading frame in one of the vectors. This is termed the "bait". A library of 
human cDNAs, inserted into a second plasmid vector so as to make fusions with 
the other domain of the transcriptional activator, is introduced into the yeast cells 
harboring the "bait" vector. If a particular yeast cell receives a library member 
that contains a human cDNA encoding a protein that interacts with hMLHl 
protein, this interaction will bring the two domains of the transcriptional activator 
into close proximity, activate transcription of the reporter gene and the yeast cell 
will turn blue. Next, the insert is sequenced to determine whether it is related to 
any sequence in the data base. The same procedure can be used to identify yeast 
proteins iii DNA mismatch repair or a related process. Performing the yeast and 
human "hunts" in parallel has certain advantages. The function of novel yeast 
homologs can be quickly determined in yeast by gene disruption and subsequent 
examination of the genetic consequences of being defective in the new found 
gene. These yeast studies will help guide the analysis of novel human "hMLHl-or 
hPMSl -interacting" proteins in much the same way that the yeast studies on PMS1 
and MLH1 have influenced our studies of the human MLH1 and PMS1 genes. 

Production of Antibodies 

By using our knowledge of the DNA sequences for hMLHl and 
hPMSl, we can synthesize all or portions of the predicted protein structures for 
the purpose of producing antibodies. One important use for antibodies directed 
to hMLHl and hPMSl proteins will be for capturing other proteins which may 
be involved in DNA mismatch repair. For example, by employing coimmuno- 
precipitation techniques, antibodies directed to either hMLHl or hPMSl may be 
precipitated along with other associated proteins which are functionally and/or 
physically related. Another important use for antibodies will be for the purpose 
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of isolating hMLHl and hPMSl proteins from tumor tissue. The hMLHl and 
hPMSl proteins from tumors can then be characterized for the purpose of 
determining appropriate treatment strategies. 

We are in the process of developing monoclonal antibodies directed 
to the hMLHl and hPMSl proteins. 

EXAMPLE 5: We have also used the following procedure to 
produce polyclonal antibodies directed to the human and mouse forms of PMS1 
protein. 

We inserted a 3* fragment of the mouse PMS1 cDNA in the 
bacterial expression plasmid vector, pET (Novagen, Madison, WI). The expected 
expressed portion of the mouse PMS1 protein corresponds to a region of 
approximately 200 amino acids at the end of the PMS1 protein. This portion of 
the mPMSl is conserved with yeast PMS1 but is not conserved with either the 
human or the mouse MLH1 proteins. One reason that we selected this portion 
of the PMS1 protein for producing antibodies is that we did not want the resulting 
antibodies to cross-react with MLH1. The mouse PMS1 protein fragment was 
highly expressed in E. coll, purified from a polyacrylamide gel and the eluted 
protein was then prepared for animal injections. Approximately 2 mg of the 
PMS1 protein fragment was sent to the Pocono Rabbit Farm (PA) for injections 
into rabbits. Sera from rabbits multiple times was tittered against the PMS1 
antigen using standard ELISA techniques. Rabbit antibodies specific to mouse 
PMS1 protein were affinity-purified using columns containing immobilized mouse 
PMS1 protein. The affinity-purified polyclonal antibody preparation was tested 
further using Western blotting and dot blotting. We found that the polyclonal 
antibodies recognized, not only the mouse PMS1 protein, but also the human 
PMS1 protein which is very similar. Based upon the Western blots, there is no 
indication that other proteins were recognized strongly by our antibody, including 
either the human or mouse MLH1 proteins. 

DNA Mismatch Repair Defective Mice 
EXAMPLE 6: In order to create a experimental model system for 
studying DNA mismatch repair defects and resultant cancer in a whole animal 
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system we have derived DNA mismatch repair defective mice using embryonic 
stem (ES) cell technology. Using genomic DNA containing a portion of the 
mPMSl gene we constructed a vector that upon homologous recombination 
causes disruption of the chromosomal mPMSl gene. Mouse ES cells from the 
129 mouse strain were confirmed to contain a disrupted mPMSl allele. The ES 
cells were injected into C57/BL6 host blastocysts to produce animals that were 
chimeric or a mixture of 129 and C57/BL6 cells* The incorporation of the ES 
cells was determined by the presence of patches of agouti coat coloring (indicative 
of ES cell contribution). All male chimeras were bred with C57/BL6 female 
mice. 

Subsequently, twelve offspring (F 2 ) were born in which the agouti 
coat color was detected indicating the germline transmission of genetic material 
from the ES cells. Analysis of DNA extracted from the tail tips of the twelve 
offspring indicated that six of the animals were heterozygous (contained one wild- 
type and one mutant allele) for the mPMSl mutation. Of the sue heterozygous 
animals, three were female, (animals F 2 -8, F r ll and F r 12) and three were males 
(F^, F 2 -10 and F 2 ~13). Four breeding pens were set up to obtain mice that were 
homozygous for rnPMSl mutation, and additional heterozygous mice. Breeding 
pen #1 which contained animals F 2 -ll and F 2 -10, yielded a total of thirteen mice 
in three litters, four of which have been genotyped. Breeding pen #2 (animals 
F r 8 and F 2 -13) gave twenty- two animals and three litters, three of which have 
been genotyped. Of the seven animals genotyped, three homozygous female 
animals have been identified. One animal died at six weeks of age from unknown 
causes. The remaining homozygous females are alive and healthy at twelve weeks 
of age. The results indicate that mPMSl homozygous defective mice are viable. 

Breeding pens #3 and #4 were used to backcross the mPMSl 
mutation into the C57/BL6 background. Breeding pen #3 (animal F 2 -12 crossed 
to a C57/BL6 mouse) produced twenty-one animals in two litters, nine of which 
have been genotyped. Breeding pen #4 (animal F 2 -6 crossed with a C57/BL6 
mouse) gave eight mice. In addition, the original male chimera (breeding pen 
#5) has produced thirty-one additional offspring. 
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To genotype the animals, a series of PCR primers have been 
developed that are used to identify mutant and wild-type mPMSl genes. They 
are: (SEQ ID NOS: 143-148, respectively) 
Primer 1: 5TTCGGTGACAGATTTGTAAATG-3' 
5 Primer 2: S'TITACGGAGCCCTGGC-S' 

Primer 3: S'TCACCATAAAAATAGTTTCCCG-y 
Primer 4: 5TCCTGGATCATATTTTCTGAGC-3' 
Primer 5: 5TTTCAGGTATGTCCTGTTACCC-3' 
Primer 6: 5TGAGGCAGCTTTTAAGAAACTC-3* 

10 

Primers 1 + 2 (5'targeted) 
Primers 1+3 (5'untargeted) 
Primers 4+5 (3' targeted) 
Primers 4+6 (3'untargeted) 

15 The mice we have developed provide an animal model system for 

studying the consequences of defects in DNA mismatch repair and resultant 
HNPCC. The long term survival of mice homozygous and heterozygous for the 
mPMSl mutation and the types and timing of tumors in these mice will be 
determined. The mice will be screened daily for any indication of cancer onset 

20 as indicated by a hunched appearance in combination with deterioration in coat 

condition. These mice carrying mPMSl mutation will be used to test the effects 
of other factors, environmental and genetic, on tumor formation. For example, 
the effect of diet on colon and other type of tumors can be compared for normal 
mice versus those carrying mPMSl mutation either in the heterozygous or 

25 homozygous genotype. In addition, the mPMSl mutation can be put into different 

genetic backgrounds to learn about interactions between genes of the mismatch 
repair pathway and other genes involved in human cancer, for example, p53. 
Mice carrying mPMSl mutations will also be useful for testing the efficacy of 
somatic gene therapy on the cancers that arise in mice, for example, the expected 

30 colon cancers. Further, isogenic fibroblast cell lines from the homozygous and 

heterozygous mPMSl mice can be established for use in various cellular studies, 
including the determination of spontaneous mutation rates. 
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We are currently constructing a vector for disrupting the mouse 
mMLHl gene to derive mice carrying mutation in mMLHl. We will compare 
mice carrying defects in mPMSl to mice carrying defects in mMLHl. In addition, 
we will construct mice that carry mutations in both genes to see whether there is 
a synergistic effect of having mutations in two HNPCC genes* Other studies on 
the mMLHl mutant mice will be as described above for the mPMSl mutant mice. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Liskay, Robert M, 
Bronner , C . Eric 
Baker, Sean M. 
Bollag, Roni J. 
Kolodner, Richard D * 

(ii) TITLE OF INVENTION: COMPOSITIONS AND METHODS RELATING TO DNA 
MISMATCH REPAIR GENES 

(iii) NUMBER OF SEQUENCES: 148 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Kolisch, Hartwell, Dickinson , McCormack & 

Heuser 

<B) STREET: 520 S.W. Yamhill Street, Suite 200 

(C) CITY: Portland 

( D ) STATE : Oregon 

(E) COUNTRY: U.S.A. 

(F) ZIP: 97204 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 
. (B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC— DOS /MS— DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 • 

(vi) CURRENT APPLICATION DATA:" 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 
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(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Van Rysselberghe, Pierre C. 

(B) REGISTRATION NUMBER: 33,557 

(C) REFERENCE /DOCKET NUMBER: OHSU 306B 
(ix) TELECOMMUNICATION INFORMATION : 

(A) TELEPHONE; (503) 224-6655 

(B) TELEFAX: (503) 295-6679 

(C) TELEX: 360619 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 361 amino acids 

(B) TYPE: amino acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Met Pro He Gin Val Leu Pro Pro Gin Leu Ala Asn Gin He Ala Ala 

1 5 10 15 

Gly Glu Val Val Glu Arg Pro Ala Ser Val Val Lys Glu Leu Val Glu 

20 25 30 

Asn Ser Leu Asp Ala Gly Ala Thr Arg Val Asp He Asp He Glu Arg 

35 40 45 

Gly Gly Ala Lys Leu He Arg He Arg Asp Asn Gly Cys Gly He Lys 

50 55 60 

Lys Glu Glu Leu Ala Leu Ala Leu Ala Arg His Ala Thr Ser Lys He 
65 70 75 80 

Ala Ser Leu Asp Asp, Leu Glu Ala He He Ser Leu Gly Phe Arg Gly 

85 90 95 

Glu Ala Leu Ala Ser He Ser Ser Val Ser Arg Leu Thr Leu Thr Ser 

100 105 110 

Arg Thr Ala Glu Gin Ala Glu Ala Trp Gin Ala Tyr Ala Glu Gly Arg 
115 120 125 
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Asp Met Asp Val Thr Val Lys Pro Ala Ala His Pro Val Gly Thr Thr 

130 135 140 

Leu Glu Val Leu Asp Leu Phe Tyr Asn Thr Pro Ala Arg Arg Lys Phe 
145 150 155 160 

Met Arg Thr Glu Lys Thr Glu Phe Aan His He Asp Glu He He Arg 

165 170 175 

Arg He Ala Leu Ala Arg Phe Asp Val Thr Leu Asn Leu Ser His Asn 

180 185 190 

Gly Lys Leu Val Arg Gin Tyr Arg Ala Val Ala Lys Asp Gly Gin Lys 

195 200 205 

Glu Arg Arg Leu Gly Ala lie Cys Gly Thr Pro Phe Leu Glu Gin Ala 

210 215 220 

Leu Ala He Glu Trp Gin His Gly Asp Lys Thr Lys Arg Gly Trp Val 
225 230 235 240 

Ala Asp Pro Asn His Thr Thr Thr Ala Leu Thr Glu He Gin Tyr Cys 

245 ' 250 255 

Tyr Val Asn Gly Arg Met Met Arg Asp Arg Leu lie Asn His Ala He 

260 265 270 

Arg Gin Ala Cys Glu Asp Lys Leu Gly Ala Asp Gin Gin Pro Ala Phe 

275 280 285 

Val Leu Tyr Leu Glu He Asp Pro His Gin Val Asp Val Asn Val His 

* 290 295 30Q 

Pro Ala Lys His Glu Val Arg Phe His Gin Ser Arg Leu Val His Asp 
305 310 315 320 

Phe He Tyr Gin Gly Val Leu Ser Val Leu Gin Gin Gin Thr Glu Thr 

325 330 335 

Ala Leu Pro Leu Glu Glu He Ala Pro Ala Pro Arg His Val Gin Glu 

340 345 350 

Asn Arg He Ala Ala Gly Arg Asn His 
355 360 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 538 amino acids 
<B) TYPE: amino acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 
<ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ser His He He Glu Leu Pro Glu Met Leu Ala Asn Gin He Ala 

15 10 15 

Ala Gly Glu Val He Glu Arg Pro Ala Ser Val Cys Lys Glu Leu Val 

20 25 30 

Glu Asn Ala He Asp Ala Gly Ser Ser Gin He He He Glu He Glu 
35 40 45 
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Glu Ala Gly Leu Lys Lys Val Gin lie Thr Asp Asn Gly His Gly He 

50 55 60 

Ala His Asp Glu Val Glu Leu Ala Leu Arg Arg His Ala Thr Ser Lys 
65 70 75 80 

He Lys Asn Gin Ala Asp Leu Phe Arg He Arg Thr Leu Gly Phe Arg 

85 90 95 

Gly Glu Ala Leu Pro Ser He Ala Ser Val Ser Val Leu Thr Leu Leu 

100 105 110 

Thr Ala Val Asp Gly Ala Ser His Gly Thr Lys Leu Val Ala Arg Gly 

115 120 125 

Gly Glu Val Glu Glu Val He Pro Ala Thr ser Pro Val Gly Thr Lys 

130 135 140 

Val Cys Val Glu Asp Leu Phe Phe Asn Thr Pro Ala Arg Leu Lys Tyr 
145 150 155 160 

Met Lys Ser Gin Gin Ala Glu Leu Ser His He He Asp lie Val Asn 

165 170 175 

Arg Leu Gly Leu Ala His Pro Glu He Ser Phe Ser Leu He Ser Asp 

180 185 190 

Gly Lys Glu Met Thr Arg Thr Ala Gly Thr Gly Gin Leu Arg Gin Ala 

195 200 205 

He Ala Gly He Tyr Gly Leu Val Ser Ala Lys Lys Met He Glu He 
210 215 220 
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Glu Asn Ser Asp Leu Asp Phe Glu He Ser Gly Phe Val Ser Leu Pro 
225 230 235 240 

Glu Leu Thr Arg Ala Asn Arg Asn Tyr He Ser Leu Phe He Asn Gly 

245 250 255 

Arg Tyr He Lys Asn Phe Leu Leu Asn Arg Ala He Leu Asp Gly Phe 

260 265 270 

Gly Ser Lys Leu Met Val Gly Arg Phe Pro Leu Ala Val He His He 

275 280 285 

His He Asp Pro Tyr Leu Ala Asp Val Asn Val His Pro Thr Lys Gin 

290 295 300 

Glu Val Arg He Ser Lys Glu Lys Glu Leu Met Thr Leu Val Ser Glu 
305 310 315 320 

Ala He Ala Asn Ser Leu Lys Glu Gin Thr Leu He Pro Asp Ala Leu 
325 330 335 

. Glu Asn Leu Ala Lys Ser Thr Va^l Arg Asn Arg Glu Lys Val Glu Gin 
340 345 350 

Thr He Leu Pro Leu Ser Phe Pro Glu Leu Glu Phe Phe Gly Gin Met 

355 360 365 

His Gly Thr Tyr Leu Phe Ala Gin Gly Arg Asp Gly Leu Tyr He He 

370 375 380 

Asp Gin His *Ala Ala Gin Glu Arg Val Lys Tyr Glu Glu Tyr Arg Glu 
385 390 • 395 400 

Ser He Gly Asn Val Asp Gin Ser Gin Gin Gin Leu Leu Val Pro Tyr 

405 410 415 

He Phe Glu Phe Pro Ala Asp Asp Ala Leu Arg Leu Lys Glu Arg Met 

420 425 430 

Pro Leu Leu Glu Glu Val Gly Val Phe Leu Ala Glu Tyr Gly Glu Asn 

435 440 445 

Gin Phe He Leu Arg Glu His Pro He Trp Met Ala Glu Glu Glu He 

450 455 460 

Glu Ser Gly He Tyr Glu Met Cys Asp Met Leu Leu Leu Thr Lys Glu 
465 470 475 480 

Val Ser He Lys Lys Tyr Arg Ala Glu Leu Ala He Met Met Ser Cys 
485 490 495 

Lys Arg Ser He Lys Ala Asn His Arg lie Asp Asp His Ser Ala Arg 

500 505 510 

Gin Leu Leu Tyr Gin Leu Ser Gin Cys Asp Asn Pro Tyr Asn Cys Pro 

515 520 525 

His Gly Arg Pro Val Leu Val His Phe Thr 
530 535 
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(2) INFORMATION FOR SEQ ID NOi3: 

<i) SEQUENCE CHARACTERISTICS : 

<A) LENGTH: 607 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY i linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Phe His His lie Glu Asn Leu Leu lie Glu Thr Glu Lys Arg Cys 

15 10 15 

Lys Gin Lys Glu Gin Arg Tyr He Pro val Lys Tyr Leu Phe Ser Met 

20 25 30 

Thr Gin He His Gin He Asn Asp He Asp Val His Arg He Thr Ser 

35 40 45 

Gly Gin Val He Thr Asp Leu Thr Thr Ala Val Lys Glu Leu Val Asp 

50 55 60 

Asn Ser He Asp Ala Asn Ala Asn Gin He Glu He He Phe Lys Asp 
65 70 75 80 

Tyr Gly Leu Glu Ser He Glu Cys Ser Asp Asn Gly Asp Gly He Asp 

85 90 95 

Pro Ser Asn Tyr Glu Phe Leu Ala Leu Lys His Tyr Thr Ser Lys He 

100 105 110 

Ala Lyd Phe Gin Asp Val Ala Lys Val Gin Thr Leu Gly Phe Arg Gly 

115 120 125 

Glu Ala Leu Ser Ser Leu Cys Gly He Ala Lys Leu Ser Val He Thr 
130 135 140 



Thr Thr Ser Pro. Pro Lys Ala Asp Lys Glu Leu Tyr Asp Met Val Gly 
145 150 155 160 

His He Thr Ser Lys Thr Thr Thr Ser Arg Asn Lys Gly Thr Thr Val 

165 170 175 

Leu Val Ser Gin Leu Phe His Asn Leu Pro Val Arg Gin Lys Glu Phe 

180 185 190 

Ser Lys Thr Phe Lys Arg Gin Phe Thr Lys Cys Leu Thr Val He Gin 

195 200 205 

Gly Tyr Ala He He Asn Ala Ala He Lys Phe Ser Val Trp Asn He 

210 215 220 

Thr Pro Lys Gly Lys Lys Asn Leu He Leu Ser Thr Met Arg Asn Ser 
225 230 235 240 

Ser Met Arg Lys Asn He Ser Ser Val Phe Gly Ala Gly Gly Met Arg 

245 250 255 

Gly Glu Leu Glu Val Asp Leu Val Leu Asp Leu Asn Pro Phe Lys Asn 

260 265 270 

Arg Met Leu Gly Lys Tyr Thr Asp Asp Pro Asp Phe Leu Asp Leu Asp 
275 280 285 
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Tyr Lys II- ^ Val Lys Gly Tyr He ser Gin Asn Ser Phe Gly Cys 

Gly £ Asn Ser Lys Asp Arg Phe He Tyr VaX Asn Lys Arg Pro 

310 31 
Val Glu Tyr Ser Thr Leu Leu Lys Cys Cys Asn Glu Val Tyr x*. Thr 

325 230 
Phe Asn Asn val Gin Phe Pro A1 a Val Phe Leu Asn Leu Glu Leu Pro 

340 345 
Met Ser Leu He A ap Val A sn Val Thr Pro Asp Lys Arg Val He Leu 

355 360 
L eu His A sn Glu Arg Ala Val He Asp He Phe Lys Thr Thr Leu Ser 

^70 375 

Mp xv, Tyr ». «. Cl» «. — M. «- — «* <*' £ 

, ftS 390 395 

Gln ser Glu Gin Gin Ala Gin Lys Arg Leu Leu Thr Glu Val Phe Asp 



405 41° 



ASP ASP Phe Lys Lys Met Glu Val Val Gly Gin Phe A sn Leu Gly Phe 
420 4 ^ , 

lie He Val Thr Arg Lys Val A sp Asn Lys Ser Asp Leu Phe He Val 

435 440 445 

455 ' 460 

Val Thr Val Phe Lys Ser Gin Lys Leu He He Pro Gin Pro Val.Glu 

470 

Z Ser Val He Asp Glu Leu Val Val Leu Asp Asn Leu Pro Val Phe 
485- 490 4 f 

500 505 
ser Arg Val Lys Leu Leu Ser Leu Pro Thr Ser Lys Gin Thr Leu Phe 

520 525 
ASP Leu g" Asp Phe Asn Glu Leu He His Leu He Lys Glu Asp Gly 

535 540 
Oly Leu Arg Arg Asp Asn He Arg Cys Ser Lys He Arg Ser Met Phe 
,,«. 550 555 

A la Met Arg Ala Cys Arg Ser Ser He Met He Gly Lye Pro Leu Asn 

565 570 
Lys L ys Thr Met Thr Arg Val Val His Asn Leu Ser Glu Leu Asp Lys 

Pro Trp Asn Cys Pro His Gly Arg Pro Thr Met Arg His Leu 



595 



600 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS; 

<A) LENGTH: 2484 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CTTGGCTCTT CTGGCGCCAA AATGTCGTTC GTGGCAGGGG TTATTCGQCG GCTGGACGAG 60 

ACAGTGGTGA ACCGCATCGC GGCGGGGGAA GTTATCCAGC GGCCAGCTAA TGCTATCAAA 120 

GAGATGATTG AGAACTGTTT AGATGCAAAA TCCACAAGTA TTCAAGTGAT TGTTAAAGAG 180 

GGAGGCCTGA AGTTGATTCA GATCCAAGAC AATGGCACCG GGATCAGGAA AGAAGATCTG 240 

GATATTGTAT GTGAAAGGTT CACTACTAGT AAACTGCAGT CCTTTGAGGA TTTAGCCAGT 300 

ATTTCTACCT ATGGCTTTCG AGGTGAGGCT TTGGCCAGCA TAAGCCATGT GGCTCATGTT 360 

ACTATTACAA CGAAAACAGC TGATGGAAAG TGTGCATACA GAGCAAGTTA CTCAGATGGA 420 

AAACTGAAAG CCCCTCCTAA ACCATGTGCT GGCAATCAAG GGACCCAGAT CACGGTGGAG 480 

GACCTTTTTT ACAACATAGC CACGAGGAGA AAAGCTTTAA AAAATCCAAG TGAAGAATAT 540 

GGGAAAATTT TGGAAGTTGT TGGCAGGTAT TCAGTACACA ATGCAGGCAT TAGTTTCTCA 600 

GTTAAAAAAC AAGGAGAGAC AGTAGCTGAT GTTAGGACAC TACCCAATGC CTCAACCGTG 660 

GACAATATTC GCTCCATCTT TGGAAATGCT GTTAGTCGAG AACTGATAGA AATTGGATGT 720 

GAGGATAAA& CCCTAGCCTT CAAAATGAAT GGTTACATAT CCAATGCAAA CTACTCAGTG 780 

AAGAAGTGCA TCTTCTTACT CTTCATCAAC CATCGTCTGG TAGAATCAAC TTCCTTGAGA 840 

AAAGCCATAG AAACAGTGTA TGCAGCCTAT TTGCCCAAAA ACACACACCC ATTCCTGTAC 900 

CTGAGTTTAG AAATCAGTCC CCAGAATGTG GATGTTAATG TGCACCCCAC AAAGCATGAA 960 

GTTCACTTCC TGCACGAGGA GAGCATCCTG GAGCGGGTGC AGCAGCACAT CGAGAGCAAG 1020 

CTCCTGGGCT CCAATTCCTC CAGGATGTAC TTCACCCAGA CTTTGCTACC AGGACTTGCT 1080 

GGCCCCTCTG GGGAGATGGT TAAATCCACA ACAAGTCTGA CCTCGTCTTC TACTTCTGGA 1140 

AGTAGTGATA AGGTCTATGC CCACCAGATG GTTCGTACAG ATTCCCGGGA ACAGAAGCTT 1200 

GATGCATTTC TGCAGCCTCT GAGCAAACCC CTGTCCAGTC AGCCCCAGGC CATTGTCACA 1260 

GAGGATAAGA CAGATATTTC TAGTGGCAGG GCTAGGCAGC AAGATGAGGA GATGCTTGAA 1320 

CTCCCAGCCC CTGCTGAAGT GGCTGCCAAA AATCAGAGCT TGGAGGGGGA TACAACAAAG 1380 

GGGACTTCAG AAATGTCAGA GAAGAGAGGA CCTACTTCCA GCAACCCCAG AAAGAGACAT 1440 

CGGGAAGATT CTGATGTGGA AATGGTGGAA GATGATTCCC GAAAGGAAAT GACTGCAGCT 1500 

TGTACCCCCC GGAGAAGGAT CATTAACCTC ACTAGTGTTT TGAGTCTCCA GGAAGAAATT 1560 

AATGAGCAGG GACATGAGGT TCTCCGGGAG ATGTTGCATA ACCACTCCTT CGTGGGCTGT 1620 

GTGAATCCTC AGTGGGCCTT GGCACAGCAT CAAACCAAGT TATACCTTCT CAACACCACC 1680 

AAGCTTAGTG AAGAACTGTT CTACCAGATA CTCATTTATG ATTTTGCCAA TTTTGGTGTT 1740 

CTCAGGTTAT CGGAGCCAGC ACCGCTCTTT GACCTTGCCA TGCTTGCCTT AGATAGTCCA 1800 

GAGAGTGGCT GGACAGAGGA AGATGGTCCC AAAGAAGGAC TTGCTGAATA CATTGTTGAG 1860 

TTTCTGAAGA AGAAGGCTGA GATGCTTGCA GACTATTTCT CTTTGGAAAT TGATGAGGAA 1920 

GGGAACCTGA TTGGATTACC CCTTCTGATT GACAACTATG TGCCCCCTTT GGAGGGACTG 1980 

CCTATCTTCA TTCTTCGACT AGCCACTGAG GTGAATTGGG ACGAAGAAAA GGAATGTTTT 2040 

GAAAGCCTCA GTAAAGAATG CGCTATGTTC TATTCCATCC GGAAGCAGTA CATATCTGAG 2100 

GAGTCGACCC TCTCAGGCCA GCAGAGTGAA GTGCCTGGCT CCATTCCAAA CTCCTGGAAG 2160 

TGGACTGTGG AACACATTGT CTATAAAGCC TTGCGCTCAC ACATTCTGCC TCCTAAACAT 2220 
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TTCACAGAAG ATGGAAATAT CCTGCAGCTT GCTAACCTGC CTGATCTATA CAAAGTCTTT 2280 

GAGAGGTGTT AAATATGGTT ATTTATGCAC TGTGGGATGT GTTCTTCTTT CTCTGTATTC 2340 

CGATACAAAG TGTTGTATCA AAGTGTGATA TACAAAGTGT ACCAACATAA GTGTTGGTAG 2400 

CACTTAAGAC TTATACTTGC CTTCTGATAG TATTCCTTTA TACACAGTGG ATTGATTATA 2460 

AATAAATAGA TGTGTCTTAA CATA 2484 



<2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 756 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Met Ser Phe Val Ala GXy Val lie Arg Arg Leu Asp Glu Thr Val Val 

1 5 ' 10 15 

Asn Arg lie Ala Ala Gly Glu Val lie Gin Arg Pro Ala Asn Ala lie 

20 25 30 

Lys Glu Met lie Glu Asn Cys Leu Asp Ala Lys Ser Thr Ser lie Gin 

35 40 45 

Val lie Val Lys Glu Gly Gly Leu Lys Leu He Gin He Gin Asp Asn 

50 55 60 

Gly Thr Gly He Arg Lys Glu Asp Leu A5p He Val Cys Glu Arg Phe 
65 70 -75 80 

Thr Thr Ser Lys Leu Gin Ser Phe Glu Asp Leu Ala Ser He Ser Thr 

fits 90 95 

Tyr Gly Phe Arg Gly Glu Ala Leu Ala Ser He Ser His Val Ala His 

100 105 110 

Val Thr He Thr Thr Lys Thr Ala Asp Gly Lys Cys Ala Tyr Arg Ala 

115 120 125 

Ser Tyr Ser Asp Gly Lys Leu Lys Ala Pro Pro Lys Pro Cys Ala Gly 

130 135 140 

Asn Gin Gly Thr Gin He Thr Val Glu Asp Leu Phe Tyr Asn lie Ala 
145 150 155 160 

Thr Arg Arg Lys Ala Leu Lys Asn Pro Ser Glu Glu Tyr Gly Lys lie 

165 170 175 

Leu Glu Val Val Gly Arg Tyr Ser Val His Asn Ala Gly He Ser Phe 

180 185 190 

Ser Val Lys Lys Gin Gly Glu Thr Val Ala Asp Val Arg Thr Leu Pro 

195 200 205 

Asn Ala Ser Thr Val Asp Asn He Arg Ser He Phe Gly Asn Ala Val 

210 215 220 

Ser Arg Glu Leu He Glu He Gly Cys Glu Asp Lys Thr Leu Ala Phe 
225 230 235 240 
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Lys Met Asn Gly Tyr He Ser Asn Ala Asn Tyr Ser Val Lys Lys cys 

245 250 255 

He Phe Leu Leu Phe He Asn His Arg Leu Val Glu Ser Thr Ser Leu 

260 265 270 

Arg Lys Ala He Glu Thr Val Tyr Ala Ala Tyr Leu Pro Lys Asn Thr 

275 280 285 

His Pro Phe Leu Tyr Leu Ser Leu Glu He Ser Pro Gin Asn Val Asp 

290 295 300 

Val Asn Val His Pro Thr Lys His Glu Val His Phe Leu His Glu Glu 
305 310 315 320 

Ser He Leu Glu Arg Val Gin Gin His He Glu Ser Lys Leu Leu Gly 

325 330 335 

Ser Asn Ser Ser Arg Met Tyr Phe Thr Gin Thr Leu Leu Pro Gly Leu 

340 345 350 

Ala Gly Pro Ser Gly Glu Met Val Lys Ser Thr Thr Ser Leu Thr Ser 

355 360 365 

Ser Ser Thr Ser Gly Ser Ser Asp Lys Val Tyr Ala His Gin Met Val 

370 375 380 

Arg Thr Asp Ser Arg Glu Gin Lys Leu Asp Ala Phe Leu Gin Pro Leu 
385 390 395 400 

Ser Lys Pro Leu Ser Ser Gin Pro Gin Ala He Val Thr Glu Asp Lys 

405 410 415 

Thr Asp lie Ser Ser Gly Arg Ala Arg Gin Gin Asp Glu Glu Met Leu 

420 425 430 

Glu Leu Pro Ala Pro Ala Glu Val Ala Ala Lys Asn Gin Ser Leu Glu 

435 440 445 

Gly Asp Thr Thr Lys Gly Thr Ser Glu Met Ser Glu Lys Arg Gly Pro 

450 455 460 

Thr Ser Ser Asn Pro Arg Lys Arg His Arg Glu Asp Ser Asp Val Glu 
465 470 475 480 

Met Val Glu Asp Asp Ser Arg Lys Glu Met Thr Ala Ala Cys Thr Pro 

485 490 495 

Arg Arg Arg He He Asn Leu Thr Ser Val Leu Ser Leu Gin Glu Glu 

500 505 510 

He Asn Glu Gin Gly His Glu Val Leu Arg Glu Met Leu His Asn His 

515 520 525 

Ser Phe Val Gly Cys Val Asn Pro Gin Trp Ala Leu Ala Gin His Gin 

530 535 540 

Thr Lys Leu Tyr Leu Leu Asn Thr Thr Lys Leu Ser Glu Glu Leu Phe 
545 550 555 560 

Tyr Gin He Leu He Tyr Asp Phe Ala Asn Phe Gly Val Leu Arg Leu 

565 570 575 

Ser Glu Pro Ala Pro Leu Phe Asp Leu Ala Met Leu Ala Leu Asp Ser 
580 585 590 
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Pro Glu Ser Gly Trp Thr Glu Glu 
595 600 
Glu Tyr lie Val Glu Phe Leu Lys 

610 615 
Tyr Phe Ser Leu Glu He Asp Glu 
625 630 
Leu Leu He Asp Asn Tyr Val Pro 
645 

He Leu Arg Leu Ala Thr Glu Val 
660 



Asp Gly Pro Lys Glu Gly Leu Ala 
605 

Lys Lys Ala Glu Met Leu Ala Asp 
620 

Glu Gly Asn Leu He Gly Leu Pro 
635 640 
Pro Leu Glu Gly Leu Pro He Phe 

650 655 
Asn Trp Asp Glu Glu Lys Glu Cys 
665 670 



Phe Glu Ser Leu Ser Lys Glu Cys Ala Met Phe Tyr Ser He Arg Lys 

675 680 685 

Gin Tyr He Ser Glu Glu Ser Thr Leu Ser Gly Gin Gin Ser Glu Val 

690 695 700 

Pro Gly Ser He Pro Asn Ser Trp Lys Trp Thr Val Glu His He Val 
705 710 715 720 

Tyr Lys Ala Leu Arg Ser His He Leu Pro Pro Lys His Phe Thr Glu 

725 730 735 

Asp Gly Asn He Leu Gin Leu Ala Asn Leu Pro Asp Leu Tyr Lys Val 

740 745 750 

Phe Glu Arg Cys 
755 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 397 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
TGGCTGGATG CTAAGCTACA GCTGAAGGAA GAACGTGAGC ACGAGGCACT GAGGTGATTG 60 
GCTGAAGGCA CTTCCGTTGA GCATCTAGAC GTTTCCTTGG CTCTTCTGGC GCCAAAATGT 120 
CGTTCGTGGC AGGGGTTATT CGGCGGCTGG ACGAGACAGT GGTGAACCGC ATCGCGGCGG 180 
GGGAAGTTAT CCAGCGGCCA GCTAATGCTA TCAAAGAGAT GATTGAGAAC TGGTACGGAG 240 
GGAGTCGAGC CGGGCTCACT TAAGGGCTAC GACTTAACGG GCCGCGTCAC TCAATGGCGC 300 
GGACACGCCT CTTTCCCCGG GCAGAGGCAT GTACAGCGCA TGCCCACAAC GGCGGAGGCC 360 
GCCGGGTTCC CTACGTGCCA TAAGCCTTCT CCTTTTC 397 



WO 95/16793 



PCT/US94/14746 



70 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 393 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
AAACACGTTA ATGAGGCACT ATTGTTTGTA TTTGGAGTTT GTTATCATTG CTTGGCTCAT 60 
ATTAAAATAT GTACATTAGA GTAGTTGCAG ACTGATAAAT TATTTTCTGT TTGATTTGCC 120 
AGTTTAGATG CAAAATCCAC AAGTATTCAA GTGATTGTTA AAGAGGGAGG CCTGAAGTTG 180 
ATTCAGATCC AAGACAATGG CACCGGGATC AGGGTAAGTA AAACCTCAAA GTAGCAGGAT 240 
GTTTGTGCGC TTCATGGAAG AGTCAGGACC TTTCTCTGTT CTGGAAACTA GGCTTTTGCA 300 
GATGGGATTT TTTCACTGAA AAATTCAACA CCAACAATAA ATATTTATTG AGTACCTATT 360 
ATTTGCGGGG CACTGTTCAG GGGATGTGTC AGT 393 

<2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 352 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:' 

TTTCCTGGAT TAATCAAGAA ATGGAATTCA AAGAGATTTG GAAAATGAGT AACATGATTA 60 

TTTACTCATC TTTTTGGTAT CTAACAGAAA GAAGATCTGG AT&TTGTATG TGAAAGGTTC 120 

ACTACTAGTA AACTGCAGTC CTTTGAGGAT TTAGCCAGTA TTTCTACCTA TGGCTTTCGA 180 

GGTG/.GGTAA G CT AAAG ATT- CAAGAAATGT GTAAAATATC CTCCTGTGAT GACATTGTCT 240 

GTCATTTGTT AGTATGTATT TCTCAACATA GATAAATAAG GTTTGGTACC TTTTACTTGT 300 

TAAATGTATG" CAAATCTGAG CAAACTTAAT GAACTTTAAC TTTCAAAGAC TG 352 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 287 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

TGGAAGCAGC AGCAGATAAC CTTTCCCTTT GGTGAGGTGA CAGTGGGTGA CCCAGCAGTG 60 

AGTTTTTCTT TCAGTCTATT TTCTTTTCTT CCTTAGGCTT TGGCCAGCAT AAGCCATGTG 120 

GCTCATGTTA CTATTACAAC GAAAACAGCT GATGGAAAGT GTGCATACAG GTATAGTGCT 180 

GACTTCTTTT ACTCATATAT ATTCATTCTG AAATGTATTT TGGGCCTAGG TCTCAGAGTA 240 

ATCCTGTCTC AACACCAGTG TTATCTTTGG CAGAGATCTT GAGTACG 287 
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(2) INFORMATION FOR SEQ ID NO: 10: 
<i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 336 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY i linear 

(ii) MOLECULE 'TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 10: 
TTGATATGAT TTTCTCTTTT CCCCTTGGGA TTAGTATCTA TCTCTCTACT GGATATTAAT 60 
TTGTTATATT TTCTCATTAG AGCAAGTTAC TCAGATGGAA AACTGAAAGC CCCTCCTAAA 120 
CCATGTGCTG GCAATCAAGG GACCCAGATC ACGGTAAGAA TGGTACATGG GAGAGTAAAT 180 
TGTTGAAGCT TTGTTTGTAT AAATATTGGA ATAAAAAATA AAATTGCTTC TAAGTTTTCA 240 
GGGTAATAAT AAAATGAATT TGCACTAGTT AATGGAGGTC CCAAGATATC CTCTAAGCAA 300 
GATAAATGAC TATTGGCTTT TTGGCATGGC AGCCTG 336 

(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 275 base pairs 

(B) TYPE: nucleic acid 
<C) STRAND ED NESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID* NO: 11: 

GCTTTTGCCA GGACCATCTT GGGTTTTATT TTCAAGTACT TCTATGAATT TACAAGAAAA 60 

ATCAATCTTC TGTTCAGGTG GAGGACCTTT TTTACAACAT AGCCACGAGG AGAAAAGCTT 120 

TAAAAAATCC AAGTGAAGAA TATGGGAAAA TTTTGGAAGT TGTTGGCAGG TACAGTCCAA 180 

AATCTGGGAG TGGGTCTCTG AGATTTGTCA TCAAAGTAAT GTGTTCTAGT GCTCATACAT 240 

TGAACAGTTG CTGAGCTAGA TGGTGAAAAG TAAAA " 275 



(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 389 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
CAGCAACCTA TAAAAGTAGA GAGGAGTCTG TGTTTTGACG CAGCACCTTT AGCATTTTTA 60 
TTTGGATGAA GTTTCTGCTG GTTTATTTTT CTGTGGGTAA AATATTAATA GGCTGTATGG 120 
AGATATTTTT CTTTATATGT ACCTTTGTTT AG ATT AC TC A ACTCCACTAA TTTATTTAAC 180 
TAAAAGGGGG CTCTGACATC TAGTGTGTGT TTTTGGCAAC TCTTTTCTTA CTCTTTTGTT 240 
TTTCTTTTCC AGGTATTCAG TACACAATGC AGGCATTAGT TTCTCAGTTA AAAAAGTAAG 300 
TTCTTGGTTT ATGGGGGATG GTTTTGTTTT ATGAAAAGAA AAAAGGGGAT TTTTAATAGT 360 
TTGCTGGTGG AGATAAGGTT ATGATGTTT 389 
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(2) INFORMATION FOR SEQ ID NO: 13; 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 381 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS ; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
ATGTTTCAGT CTCAGCCATG AGACAATAAA TCCTTGTGTC TTCTGCTGTT TGTTTATCAG 60 
CAAGGAGAGA CAGTAGCTGA TGTTAGGACA CTACCCAATG CCTCAACCGT GGACAATATT 120 
CGCTCCATCT TTGGAAATGC TGTTAGTCGG TATGTCGATA ACCTATATAA AAAAATCTTT 180 
TACATTTATT ATCTTGGTTT ATCATTCCAT CACATTATTT GGGAACCTTT CAAGATATTA 240 
TGTGTGTTAA GAGTTTGCTT TAGTCAAATA CACAGGCTTG TTTTATGCTT CAGATTTGTT 300 
AATGGAGTTC TTATTTCACG TAATCAACAC TTTCTAGGTG TATGTAATCT CCTAGATTCT 360 
GTGGCGTGAA TCATGTGTTC T 381 

(2) INFORMATION FOR SEQ ID NO: 14: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 526 base pairs 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
* (D) TOPOLOGY: linear 
(ii) MOLECULE TYPE : DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
ACTGAGTAGG GTAGGTGGGT GAGTGGGTGG GTGGGTGGGT GGGTGGATGG ATGGATGGGA 60 
GGATGGGTGG GTGAATGGGT GAACAGACAA ATGGATGGAT GAATGGACAG GCACAGGAGG 120 
ACCTCAAATG GACCAAGTCT TCGGGGCCCT CATTTCACAA AGTTAGTTTA TGGGAAGGAA 180 
CCTTGTGTTT TTAAATTCTG AT *1 *J f TTTGT AATGTTTGAG TTTTGAGTAT TTTCAAAAGC 240 
TTCAGAATCT CTTTTCTAAT AGAGAACTGA TAGAAATTGG ATGTGAGGAT AAAACCCTAG 300 
CCTTCAAAAT GAATGGTTAC ATATCCAATG CAAACTACTC AGTGAAGAAG TGCATCTTCT 360 
TACTCTTCAT CAACCGTAAG TTAAAAAGAA CCACATGGGA AATCCACTCA CAGGAAACAC 420 
CCACAGGGAA TTTTATGGGA CCATGGAAAA ATTTCTGAGT CCATAGGTTT GATTAAACAT 480 
GGAGAAACCT CATGGCAAAG TTTGGTTTTA TTGGGAAGCA TGTATA 526 

(2) INFORMATION FOR SEQ ID NO: 15: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 434 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
ATAGTGGGCT GGAAAGTGGC CACAGGTAAA GGTGCACCTT TCTTCCTGGG GATGTGATGT 60 
GCATATCACT ACAGAAATGT CTTTCCTGAG GTGATGTCAT GACTTTGTGT GAATGTACAC 120 
CTGTGACCTC ACCCCTCAGG ACAGTTTTGA ACTGGTTGCT TTCTTTTTAT TGTTTAGATC 180 



WO 95/16793 



PCT/US94/14746 



73 

GTCTGGTAGA ATCAACTTCC TTGAGAAAAG CCATAGAAAC AGTGTATGCA GCCTATTTGC 240 

CCAAAAACAC ACACCCATTC CTGTACCTCA GGTAATGTAG CACCAAACTC CTCAACCAAG 300 

ACTCACAAGG AACAGATGTT CTATCAGGCT CTCCTCTTTG AAAGAGATGA GCATGCTAAT 360 

AGTACAATCA GAGTGAATCC CATACACCAC TGGCAAAAGG ATGTTCTGTC CCTTCTTACA 420 

GGTACAAGGC ACAG 434 



(2) INFORMATION FOR SEQ ID NO: 16: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 458 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ^ ID NO:16: 
CTTACGCAAA GCTACACAGC TCTTAAGTAG CAGTGCCAAT ATTTGAACAC ACTCAGACTC 60 
GAGCCTGAGG TTTTGACCAC TGTGTCATCT GGCCTCAAAT CTTCTGGCCA CCACATACAC 120 
CATATGTGGG CTTTTTCTCC CCCTCCCACT ATCTAAGGTA ATTGTTCTCT CTTATTTTCC ISO 
TGACAGTT T A GAAATCAGTC CCCAGAATGT GGATGTTAAT GTGCACCCCA CAAAGCATGA 240 
AGTTCACTTC CTGCACGAGG AGAGCATCCT GGAGCGGGTG CAGCAGCACA TCGAGAGCAA 300 
GCTCCTGGGC TCCAAT'TCCT CCAGGATGTA CTTCACCCAG GTCAGGGCGC TTCTCATCCA 360 
GCTACTTCTC TGGGGCCTTT GAAATGTGCC CGGCCAGACG TGAGAGCCCA GATTTTTGCT 420 
GTTATTTAGG AACTTTTTTT GAAGTATTAC CTGGATAG * 458 



(2) INFORMATION FOR SEQ ID NO: 17: 
(i) SEQUENCE CHARACTERISTICS: 

(A> LENGTH: 618 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GATAATTATA CCTCATACTA GCTTCTTTCT TAGTACTGCT CCATTTGGGG ACCTGTATAT 60 
CTATACTTCT TATTCTGAGT CTCTCCACTA TATATATATA TATATATATA TTTTTTTTTT 120 
TTTTTTTTTT TAATACAGAC TTTGCTACCA GGACTTGCTG GCCCCTCTGG GGAGATGGTT 180 
AAATCCACAA CAAGTCTGAC CTCGTCTTCT ACTTCTGGAA GTAGTGATAA GGTCTATGCC 240 
CACCAGATGG TTCGTACAGA TTCCCGGGAA CAGAAGCTTG ATGCATTTCT GCAGCCTCTG 300 
AGCAAACCCC TGTCCAGTCA GCCCCAGGCC ATTGTCACAG AGGATAAGAC AGATATTTCT 360 
AGTGGCAGGG CTAGGCAGCA AGATGAGGAG ATGCTTGAAC TCCCAGCCCC TGCTGAAGTG 420 
GCTGCCAAAA ATCAGAGCTT GGAGGGGGAT ACAACAAAGG GGACTTCAGA AATGTCAGAG 480 
AAGAGAGGAC CTACTTCCAG CAACCCCAGG TATGGCCTTT TGGGAAAAGT ACAGCCTACC 540 
TCCTTTATTC TGTAATAAAA CTGCCTTCTA ACTTTGGCTT TTCATGAATC ACTTGCATCT 600 
TCTCTCTGCC GACTTCCC 618 
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(2) INFORMATION FOR SEQ ID NO: 18 ; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 478 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 
CTGTGCTCCA GCACAGGTCA TCCAGCTCTG TAGACCAGCG CAGAGAAGTT GCTTGCTCCC 60 
AAATGCAACC CACAAAATTT GGCTAAGTTT AAAAACAAGA ATAATAATGA TCTGCACTTC 120 
CTTTTCTTCA TTGCAGAAAG AGACATCGGG AAGATTCTGA TGTGGAAATG GTGGAAGATG 180 
ATTCCCGAAA GGAAATGACT GCAGCTTGTA CCCCCCGGAG AAGGATCATT AACCTCACTA 240 
GTGTTTTGAG TCTCCAGGAA GAAATTAATG AGCAGGGACA TGAGGGTACG TAAACGCTGT 300 
GGCCTGCCTG GGATGCATAG GGCCTCAACT GCCAAGGTTT TGGAAATGGA GAAAGCAGTC 360 
ATGTTGTCAG AGTGGCACTA CAGTTTTGAT GGGCAAGCTC CTCTTCCTTT ACTAACCCAC 420 
AATAGCATCA G C TTAAAG AC AATTTTTGAT TGGGAGAAAA GGGAGAAAAT AATCTCTG 478 

(2) INFORMATION FOR SEQ ID NO: 19: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 377 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

CAGTTTTCAC CAGGAGGCTC AAATCAGGCC TTTGCTTACT TGGTGTCTCT AGTTCTGGTG 60 

CCTGGTGCTT TGGTCAATGA AGTGGGGTTG GTAGGATTCT ATTACTTACC TGTTTTTTGG 120 

TTTTATTTTT TGTTTTGCAG TTCTCCGGGA GATGTTGCAT AACCACTCCT TCGTGGGCTG ISO 

TGTGAATCCT CAGTGGGCCT TGGCACAGCA TCAAACCAAG TTATACCTTC TCAACACCAC 240 

CAAGCTTAGG TAAATCAGCT GAGTGTGTGA ACAAGCAGAG CTACTACAAC AATGGTCCAG 300 

GGAGCACAGG CACAAAAGCT AAGGAGAGCA GCATGAAGGT AGTTGGGAAG GGCACAGGCT 360 

TTGGAGTCAG CACATGT 377 

(2) INFORMATION FOR SEQ ID NO: 20: 
(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 325 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
* (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

CCCCTGGTTG AAGCGTTGGA ATCCCACTCT TTGGAAGATT GTGTTAGACT GTTAACCAGA 60 

TTCCACAGCC AGGCAGAACT ATGTCTGTCT CATCCATGTG TCAGGGATTA CGTCTCCCAT 120 

TTGTCCCAAC TGGTTGTATC TCAAGCATGA ATTCAGCTTT TCCTTAAAGT CACTTCATTT 180 

TTATTTTCAG TGAAGAACTG TTCTACCAGA TACT CAT TT A TGATTTTGCC AATTTTGGTG 240 
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TTCTCAGGTT ATCGGTAAGT TTAGATCCTT TTCACTTCTG ACATTTCAAC TGACCGCCCC 300 
GCAAACAGTA GCTCTCCACT AAATA 325 

(2) INFORMATION FOR SEQ ID NO: 21: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 341 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

CATTTATGGT TTCTCACCTG CCATTCTGAT AGTGGATTCT TGGGAATTCA GGCTTCATTT 60 

GGATGCTCCG TTAAAGCTTG CTCCTTCATG TTCTTGCTTC TTCCTAGGAG CCAGCACCGC 120 

TCTTTGACCT TGCCATGCTT GCCTTAGATA GTCCAGAGAG TGGCTGGACA GAGGAAGATG 180 

GTCCCAAAGA AGGACTTGCT GAATACATTG TTGAGTTTCT GAAGAAGAAG GCTGAGATGC 240 

TTGCAGACTA TTTCTCTTTG GAAATTGATG AGGTGTGACA GCCATTCTTA TACTTCTGTT 300 

GTATTCTCCA AATAAAATTT CCAGCCGGGT GCATTGGCTC A 341 

(2) INFORMATION FOR SEQ ID NO: 22: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 260 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

CAGATAGGAG GCACAAGGCC TGGGAAAGGC ACTGGAGAAA TGGGATTTGT TTAAACTATG 60 

ACAGCATTAT TTCTTGTTCC CTTGTCCTTT TTCCTGCAAG CAGGAAGGGA ACCTGATTGG 120 

ATTACCCCTT CTGATTGACA ACTATGTGCC CCCTTTGGAG GGACTGCCTA TCTTCATTCT 180 

TCGACTAGCC ACTGAGGTCA GTGATCAAGC AGATACTAAG CATTTCGGTA CATGCATGTG 240 

TGCTGGAGGG AAAGGGCAAA 260 

(2) INFORMATION FOR SEQ ID NO: 23: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 340 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

CTATATCTTC CCAGCAATAT TCACAGTCCG TTTACAGTTT TAACGCCTAA AGTATCACAT 60 

TTCGTTTTTT AGCTTTAAGT AGTCTGTGAT CTCCGTTTAG AATGAGAATG TTTAAATTCG 120 

TACCTATTTT GAGGTATTGA ATTTCTTTGG ACCAGGTGAA TTGGGACGAA GAAAAGGAAT 180 

GTTTTGAAAG CCTCAGTAAA GAATGCGCTA TGTTCTATTC CATCCGGAAG CAGTACATAT 240 
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CTGAGGAGTC GACCCTCTCA GGCCAGCAGG TACAGTGGTG ATGCACACTG GCACCCCAGG 300 
ACTAGGACAG GACCTCATAC ATCTTAGGAG ATGAAACTTG 340 

(2) INFORMATION FOR SEQ ID NO: 24: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 563 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

AATCCTCTTG TGTTCAGGCC TGTGGATCCC TGAGAGGCTA GCCCACAAGA TCCACTTCAA 60 

AAGCCCTAGA TAACACCAAG TCTTTCCAGA CCCAGTGCAC ATCCCATCAG CCAGGACACC 120 

AGTGTATGTT GGGATGCAAA CAGGGAGGCT TATGACATCT AATGTGTTTT CCAGAGTGAA 180 

GTGCCTGGCT CCATTCCAAA CTCCTGGAAG TGGACTGTGG AACACATTGT CTATAAAGCC 240 

TTGCGCTCAC ACATTCTGCC TCCTAAACAT TTCACAGAAG ATGGAAATAT CCTGCAGCTT 300 

GCTAACCTGC CTGATCTATA CAAAGTCTTT GAGAGGTGTT AAATATGGTT ATTTATGCAC 360 

TGTGGGATGT GTTCTTCTTT CTCTGTATTC CGATACAAAG TGTTGTATCA AAGTGTGATA 420 

TACAAAGTGT ACCAACATAA GTGTTGGTAG CACTTAAGAC TTATACTTGC CTTCTGATAG 480 

TATTCCTTTA TACACAGTGG ATTGATTATA AATAAATAGA TGTGTCTTAA CATAATTTCT 540 

TATTTAATTT TATTATGTAT ATA 563 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: . 
(A) LENGTH: 137 base pairs 
.(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
CTTGGCTCTT CTGGCGCCAA AATGTCGTTC GTGGCAGGGG TTATTCGGCG GCTGGACGAG 60 
ACAGTGGTGA ACCGCATCGC GGCGGGGGAA GTTATCCAGC GGCCAGCTAA TGCTATCAAA 120 
GAGATGATTG AGAACTG 137 

(2) INFORMATION FOR SEQ ID NO: 26: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
TTTAGATGCA AAATCCACAA GTATTCAAGT GATTGTTAAA GAGGGAGGCC TGAAGTTGAT 60 
TCAGATCCAA GACAATGGCA CCGGGATCAG G 91 
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(2) INFORMATION FOR SEQ ID NO :27s 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 99 base pairs 
<B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
AAAGAAGATC TGGATATTGT ATGTGAAAGG TTCACTACTA GTAAACTGCA GTCCTTTGAG 60 
GATTTAGCCA GTATTTCTAC CTATGGCTTT CGAGGTGAG 99 



(2) INFORMATION FOR SEQ ID NO: 28: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 74 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
GCTTTGGCCA GCATAAG CC A TGTGGCTCAT GTTACTATTA CAACGAAAAC AG CTG ATGG A 60 
AAGTGTGCAT ACAG 74 

(2) INFORMATION FOR SEQ ID NO: 29: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 73 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
AGCAAGTTAC TCAGATGGAA AACTGAAAGC CCCTCCTAAA CCATGTGCTG GCAATCAAGG 60 
GACCCAGATC ACG 73 



(2) INFORMATION FOR SEQ ID NO: 30: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 92 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
GTGGAGGACC TTTTTTACAA CATAGCCACG AGGAGAAAAG CTTTAAAAAA TCCAAGTGAA 60 
GAATATGGGA AAATTTTGGA AGTTGTTGGC AG '92 
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(2) INFORMATION FOR SEQ ID NO; 31: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 
<ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
GTATTCAGTA CACAATGCAG GCATTAGTTT CTCAGTTAAA AAA 43 



(2) INFORMATION FOR SEQ ID NO: 32; 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 89 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
CAAGGAGAGA CAGTAGCTGA TGTTAGGACA CTACCCAATG CCTCAACCGT GGACAATATT 60 
CGCTCCATCT TTGGAAATGC TGTTAGTCG 89 

(2) INFORMATION FOR SEQ ID NO: 33: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 113 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single ** 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
AGAACTGATA GAAATTGGAT GTGAGGATAA AACCCTAGCC TTCAAAATGA ATGGTTACAT 60 
ATCCAATGCA AACTACTCAG TGAAGAAGTG CATCTTCTTA CTCTTCATCA ACC 113 



(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 94 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
ATCGTCTGGT AGAATCAACT TCCTTGAGAA AAGCCATAGA AACAGTGTAT GCAGCCTATT 60 
TGCCCAAAAA CACACACCCA TTCCTGTACC TCAG 94 
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(2) INFORMATION FOR SEQ ID NO: 35: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 154 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
TTTAGAAATC AGTCCCCAGA ATGTGGATGT TAATGTGCAC CCCACAAAGC ATGAAGTTCA 60 
CTTCCTGCAC GAGGAGAGCA TCCTGGAGCG GGTGCAGCAG CACATCGAGA GCAAGCTCCT 120 
GGGCTCCAAT TCCTCCAGGA TG TACTTC AC CCAG 154 



(2) INFORMATION FOR SEQ ID NO: 36: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 371 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
ACTTTGCTAC CAGGACTTGC TGGCCCCTCT GGGGAGATGG TTAAATCCAC AACAAGTCTG 60 
ACCTCGTCtT CTACTTCTGG AAGTAGTGAT AAGGTCTATG CCCACCAGAT GGTTCGTACA 120 
GATTCCCGGG AACAGAAGCT TGATGCATTT CTGCAGCCTC TGAGCAAACC CCTGTCCAGT 180 
CAGCCCCAGG CCATTGTCAC AGAGGATAAG ACAGATATTT CTAGTGGCAG GGCTAGGCAG 240 
CAAGATGAGG AGATGCTTGA ACTCCCAGCC CCTGCTGAAG TGGCTGCCAA AAATCAGAGC 300 
TTGGAGGGGG ATACAACAAA GGGGACTTCA GAAATGTCAG AGAAGAGAGG ACCTACTTCC 360 
AGCAACCCCA G 371 



(2) INFORMATION FOR SEQ ID NO: 37: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 149 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
AAAGAGACAT CGGGAAGATT CTGATGTGGA AATGGTGGAA GATGATTCCC GAAAGGAAAT 60 
GACTGCAGCT TGTACCCCCC GGAGAAGGAT CATTAACCTC ACTAGTGTTT TGAGTCTCCA 120 
GGAAGAAATT AATGAGCAGG GACATGAGG 149 
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(2) INFORMATION FOR SEQ ID NO: 38: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 109 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
TTCTCCGGGA GATGTTGCAT AACCACTCCT TCGTGGGCTG TGTGAATCCT CAGTGGGCCT 
TGGCACAGCA TCAAACCAAG TTATACCTTC TCAACACCAC CAAGCTTAG 

(2) INFORMATION FOR SEQ ID NO: 39: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
TGAAGAACTG TTCTACCAGA TACTCATTTA TGATTTTGCC AATTTTGGTG TTCTCAGGTT 
ATCG 

(2) INFORMATION FOR SEQ ID NO: 40: 
. (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 165 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
GAGCCAGCAC CGCTCTTTGA CCTTGCCATG CTTGCCTTAG ATAGTCCAGA GAGTGGCTGG 
ACAGAGGAAG ATGGTCCCAA AGAAGGACTT GCTGAATACA TTGTTGAGTT TCTGAAGAAG 
AAGGCTGAGA TGCTTGCAGA CTATTTCTCT TTGGAAATTG ATGAG 

(2) INFORMATION FOR SEQ ID NO: 41: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 93 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 
GAAGGGAACC TGATTGGATT ACCCCTTCTG ATTGACAACT ATGTGCCCCC TTTGGAGGGA 
CTGCCTATCT TCATTCTTCG ACTAGCCACT GAG 
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(2) INFORMATION FOR SEQ ID NO: 42: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 114 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: single 
(D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GTGAATTGGG ACGAAGAAAA GGAATGTTTT GAAAGCCTCA GTAAAGAATG CGCTATGTTC 60 
TATTCCATCC GGAAGCAGTA CATATCTGAG GAGTCGACCC TCTCAGGCCA GCAG 114 

(2) INFORMATION FOR SEQ ID NO: 43: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 360 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 
(ii) MOLECULE TYPE: CDNA 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
AGTGAAGTGC CTGGCTCCAT TCCAAACTCC TGGAAGTGGA CTGTGGAACA CATTGTCTAT 60 
AAAGCCTTGC GCTCACACAT TCTGCCTCCT AAACATTTCA CAGAAGATGG AAATATCCTG 120 
CAGCTTGCTA ACCTGCCTGA TCTATACAAA GTCTTTGAGA GGTGTTAAAT ATGGTTATTT 180 
ATGCACTGTG GGATGTGTTC TTCTTTCTCT GTATTCCGAT ACAAAGTGTT GTATCAAAGT 240 
GTGATATACA AAGTGTACCA ACATAAGTGT TGGTAGCACT TAAGACTTAT ACTTGCCTTC 300 
TGATAGTATT CCTTTATACA CAGTGGATTG ATTATAAATA AATAGATGTG TCTTAACATA 360 

(2) INFORMATION FOR SEQ ID NO: 44: 
(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 19 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

( ix ) FEATURE : 

(A) NAME /KEY : misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 

intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
AGGCACTGAG GTGATTGGC 19 
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(2) INFORMATION FOR SEQ ID NO: 45: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0 ) TOPOLOGY: linear 

< ix ) FEATURE : 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1 

<D) OTHER INFORMATION: /note= -primers directed to genomic 



<2> INFORMATION FOR SEQ ID NO: 46: . 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid ^ 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

* (A) NAME /KEY: misc_feature 
<B) LOCATION: 1 

(D) OTHER INFORMATION: /note= -primers directed to genomic 



(2) INFORMATION FOR SEQ ID NO: 47: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 



intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
TCGTAGCCCT TAAGTGAGC 



19 



intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
AATATGTACA TTAGAGTAGT TG 



22 



intron DNA" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
CAGAGAAAGG TCCTGACTC 



19 
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(2) INFORMATION FOR SEQ ID NO: 48: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

< C ) STRANDEDNESS : a ing le 
(0) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= -primers directed to genomic 
intron DNA* 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
AGAGATTTGG AAAATGAGTA AC 22 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: , 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY: misc_feature 
<B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA W 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
ACAATGTCAT CACAGGAGG 19 

(2) INFORMATION FOR SEQ ID NO: 50: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
AACCTTTCCC TTTGGTGAGG 20 
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(2) INFORMATION FOR SEQ ID NO; 51: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY*: linear 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION; 1 

<D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION s SEQ ID NO: 51: 
GATTACTCTG AGACCTAGGC 20 



(2) INFORMATION FOR SEQ ID NO: 52: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 22 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE i 

(A) NAME /KEY ; misc_f eature 
(B> LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO;52; 
GATTTTCTCT TTTCCCCTTG GG 22 

(2) INFORMATION FOR SEQ ID NO: 53: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: misc__f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
CAAACAAAGC TTCAACAATT TAC 23 
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(2) INFORMATION FOR SEQ ID NO? 54: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ix) FEATURE: 

<A) NAME /KEY : misc^feature 
(B) LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 



(2) INFORMATION FOR SEQ ID NO: 55: > 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME/KEY: miscjEeature 
(8) LOCATION: 1 

(D) OTHER INFORMATION: /note* "primers directed to genomic 



(2) INFORMATION FOR SEQ ID NO: 56: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY: misc_feature 

(B) LOCATION: 1 

<D) OTHER INFORMATION: /note- "primers directed to genomic 



intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
GGG TTTTATT TTCAAGTACT TCTATG 



26 



intron DNA** 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GCTCAGCAAC TGTTCAATGT ATGAGC 



26 



intron DNA" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
CTAGTGTGTG TTTTTGGC 



18 
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(2) INFORMATION FOR SEQ ID NO: 57: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME/KEY: misc^feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note* -primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: 
CATAACCTTA TCTCCACC 18 



(2) INFORMATION FOR SEQ ID NO: 58:, 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY : misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA W 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 
CTCAGCCATG AGACAATAAA TCC 23 



(2) INFORMATION FOR SEQ ID NO: 59: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME /KEY : misc^f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
DNA" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 
GGTTCCCAAA TAATGTGATG G 21 
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(2) INFORMATION FOR SEQ ID NO $60: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
{ ix ) FEATURE : 

(A) NAME/KEY: raisc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note=* "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
CAAAAGCTTC AGAATCTC 18 



(2) INFORMATION FOR SEQ ID NO: 61: 

t 

<i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 23 base pairs 
(S) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

<B) LOCATION: 1 • 

(D) OTHER INFORMATION: /note= "primers directed to genomic 

intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
CTGTGGGTGT TTCCTGTGAG TGG 23 



(2) INFORMATION FOR SEQ ID NO: 62: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
CATGACTTTG TGTGAATGTA CACC 24 
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(2) INFORMATION FOR SEQ ID NO: 63: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME /KEY : misc_feature 

(B) LOCATION: 1 

(D ) OTHER INFORMATION: /note* -primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
GAGGAGAGCC TGATAGAACA TCTG 24 



(2) INFORMATION FOR SEQ ID NO: 64: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY : misc__f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note* "primers directed to genomic 
intron DNA rt 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
GGGCTTTTTC TCCCCCTCCC 20 

(2) INFORMATION FOR SEQ ID NO: 65: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME /KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note* "primers directed to genomic 
. intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 
AAAATCTGGG CTCTCACG 18 
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(2) INFORMATION FOR SEQ ID NO: 66: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ix) FEATURE : 

(A) NAME /KEY: miscjEeature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 

intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
AATTATACCT CATACTAGC 19 



(2) INFORMATION FOR SEQ ID NO: 67: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: . 

" <A) NAME/KEY: misc_feature 
(B) LOCATION: 1 

(D) OTHER INFORMATION: /note« "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: 
GTTTTATTAC AGAATAAAGG AGG 23 



(2) INFORMATION FOR SEQ ID NO: 68: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
AAGCCAAAGT TAGAAGGCA 19 
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(2) INFORMATION FOR SEQ ID NO: 69: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY : linear 

(ix) FEATURE: 

{A) NAME /KEY: misc_feature 
(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 



(2) INFORMATION FOR SEQ ID NO: 70: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY: mis cofeature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 



(2) INFORMATION FOR SEQ ID NO: 71: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 



intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
TGCAACCCAC AAAATTTGGC 



20 



intron DNA" . 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
CTTTCTCCAT TTCCAAAACC 



20 



intron DNA" 



(xi) 



SEQUENCE DESCRIPTION: SEQ ID NO: 71: 



TGGTGTCTCT AGTTCTGG 



18 
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(2) INFORMATION FOR SEQ ID NO: 72: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 
< ix ) FEATURE : 

(A) NAME/KEY: misc_f eature 

( B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 
CATTGTTGTA GTAGCTCTGC 20 



(2) INFORMATION FOR SEQ ID NO: 73: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear. 
( ix ) FEATURE : - 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
CCCATTTGTC CCAACTGG 18 

(2) INFORMATION FOR SEQ ID NO: 74: 
(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 

(D) TOPOLOGY: linear 
{ ix ) FEATURE : 

(A) NAME/KEY : misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
CGGTCAGTTG AAATGTCAG 19 
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(2) INFORMATION FOR SEQ ID NO: 75: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
CATTTGGATG CTCCGTTAAA GC 22 



(2) INFORMATION FOR SEQ ID NO: 76: 
(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 23 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
. ( ix ) FEATURE : 

(A) NAME/KEY: misc_f eature 
<B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic, 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 
CACCCGGCrG GAAATTTTAT TTG 23 

(2) INFORMATION FOR SEQ ID NO: 77: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
GGAAAGGCAC TGGAGAAATG GG 22 
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(2) INFORMATION FOR SEQ ID NO: 78: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
<B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: 
CCCTCCAGCA CACATGCATG TACCG 25 



(2) INFORMATION FOR SEQ ID NO: 79: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
<ix) FEATURE: . 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genqmic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 
TAAGTAGTCT GTGATCTCCG 20 

(2) INFORMATION FOR SEQ ID NO: 80: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
< ix ) FEATURE : 

(A) NAME /KEY: roisc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:80: 
ATGTATGAGG TCC^GTCC 18 
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(2) INFORMATION FOR SEQ ID NO: 81: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

<A) NAME/KEY: misc_f eature 
(B) LOCATION: 1 

(D) OTHER INFORMATION: /note« "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 
GACACCAGTG TATGTTGG 18 



(2) INFORMATION FOR SEQ ID NO: 82: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D> TOPOLOGY: linear 

(ix) ' FEATURE * 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= " primers directed to genomic 

intron DNA" 
(x?) SEQUENCE DESCRIPTION: SEQ ID NO:82: 
GAGA£" AAG AACACATCCC 20 

(2) INFORMATION FOR SEQ ID NO: 83: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME/KEY: misc_£eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 
TGTAAAACGA CGGCCAGTCA CTGAGGTGAT TGGCTGAA 38 



WO 95/16793 



PCT/US94/14746 



95 

(2) INFORMATION FOR SEQ ID NO: 84: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDED NESS : single 
(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B ) LOCATION: 1 

<D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 
TAGCCCTTAA GTGAGCCCG 1 



(2) INFORMATION FOR SEQ ID NO: 85: 
(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 38 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 
(ix) FEATURE: . 

(A) NAME /KEY: misc_feature 
{ B ) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:85: 
TGTAAAACGA CGGCCAGTTA CATTAGAGTA GTTGCAGA 31 

(2) INFORMATION FOR SEQ ID NO:86: 
(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME/KEY: misc__f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 
AGGTCCTGAC TCTTCCATG 1 
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(2) INFORMATION FOR SEQ ID NO: 87: 
(i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

( ix ) FEATURE : 

(A) NAME /KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA W 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 
TGTAAAACGA CGGCCAGTTT GGAAAATGAG TAACATGATT 40 

(2) INFORMATION FOR SEQ ID NO: 88: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE : 

(A) NAME/KEY: miscjeature 

(B) LOCATION: 1 

(D) .OTHER INFORMATION: /note* "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:88: 
TGTCATCACA GGAGGATAT 19 



(2) INFORMATION FOR SEQ ID NO: 89: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA " 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 
TGTAAAACGA CGGCCAGTCT TTCCCTTTGG TGAGGTGA 38 
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(2) INFORMATION FOR SEQ ID NO: 90: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: miec_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 

intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 
TACTCTGAGA CCTAGGCCCA 20 



(2) INFORMATION FOR SEQ ID NO: 91: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 ine ar 
(ix)- FEATURE: 

<A> NAME /KEY : misc_feature 
<B) LOCATION: 1 

(D) OTHER INFORMATION: 7note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:91: 
TGTAAAACGA CGGCCAGTTC TCTTTTCCCC TTGGGATTAG 40 



(2) INFORMATION FOR SEQ ID NO: 92 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

( C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note« "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 92: 
ACAAAGCTTC AACAATTTAC TCT 23 
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(2) INFORMATION FOR SEQ ID NO: 93: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

<ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 

intron DNA n 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 
TGTAAAACGA CGGCCAGTGT TTTATTTTCA AGTACTTCTA TGAATT 46 



(2) INFORMATION FOR SEQ ID NO: 94: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94 : 
CAGCAACTGT TCAATGTATG AGCACT 26 

(2) INFORMATION FOR SEQ ID NO: 95: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY : misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:95: 
TGTAAAACGA CGGCCAGTGT GTGTGTTTTT GGCAAC 36 
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(2) INFORMATION FOR SEQ ID NO: 96: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME /KEY : misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 
AACCTTATCT CCACCAGC IS 



(2) INFORMATION FOR SEQ ID NO: 97: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
. ( ix ) FEATURE : . 

(A) NAME /KEY : mis cofeature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers, directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 
TGTAAAACGA CGGCCAGTAG CCATGAGACA ATAAATCCTT G 41 

(2) INFORMATION FOR SEQ ID NO: 98: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 
TCCCAAATAA TGTGATGGAA TG 22 
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(2) INFORMATION FOR SEQ ID NO: 99: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 

intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 
TGTAAAACGA CGGCCAGTAA GCTTCAGAAT CTCTTTT 37 



(2) INFORMATION FOR SEQ ID NO: 100: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: . 

* (A) NAME /KEY: miscjfeature 
(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100: 
TGGGTGTTTC CTGTGAGTGG ATT 23 



(2) INFORMATION FOR SEQ ID NO: 101: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME/KEY: miscjfeature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 
TGTAAAACGA CGGCCAGTAC TTTGTGTGAA TGTACACCTG TG 42 
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(2) INFORMATION FOR SEQ ID NO: 102: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
<B) TYPE: nucleic acid 
<C) STRAND ED NESS : single 
(0) TOPOLOGY: linear 
<ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 
GAGAGCCTGA TAGAACATCT GTTG 24 



(2) INFORMATION FOR SEQ ID NO: 103: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear . 
(ix) FEATURE : * . 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 
TGTAAAACGA CGGCCAGTCT TTTTCTCCCC CTCCCACTA 39 



(2) INFORMATION FOR SEQ ID NO: 104: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
<ix) FEATURE: 

(A) NAME /KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 
TCTGGGCTCT CACGTCT 17 
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(2) INFORMATION FOR SEQ ID NO: 105: 
(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 18 base pairs 
(5) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY : miscjf eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note« "primers directed to genomic 



(2) INFORMATION FOR SEQ ID NO: 106: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: . 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic- 
intron DNA" 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 
TGTAAAACGA CGGCCAGTGT TTG CTC AG AG GCTGC 35 

(2) INFORMATION FOR SEQ ID NO: 107: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 
<B) LOCATION: 1 

<D) OTHER INFORMATION: /note- "primers directed to genomic 



intron DNA" 



(xi) 



SEQUENCE DESCRIPTION: SEQ ID NO: 105: 



CTTATTCTGA GTCTCTCC 



18 



intron DNA" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 
GATGGTTCGT ACAGATTCCC G 



21 
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(2) INFORMATION FOR SEQ ID NO: 108: 
(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH : 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
<ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: 
TGTAAAACGA CGGCCAGTTT ATTACAGAAT AAAGGAGGTA G 41 



(2) INFORMATION FOR SEQ ID NO: 109: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
<ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 
TGTAAAACGA CGGCCA&TAA CCCACAAAAT TTGGCTAAG 39 

(2) INFORMATION FOR SEQ ID NO: 110: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note- "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 110: 
TCTCCATTTC CAAAACCTTG 20 



I 
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(2) INFORMATION FOR SEQ ID NO: 111: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME /KEY: misc_f eature 

( B ) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: 
TGTCTCTAGT TCTGGTGC 18 

(2) INFORMATION FOR SEQ ID NO: 112: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
( ix ) FEATURE : 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA** 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 
TGTAAAACGA CGGCCAGTTG TTGTAGTAGC TCTGCTTG 38 

(2) INFORMATION FOR SEQ ID NO: 113 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY : misc_f eature 

(B) LOCATION: 1 

j[D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NOK 
ATTTGTCCCA ACTGGTTGTA 20 
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(2) INFORMATION FOR SEQ ID NO: 114: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 39 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: 
TGTAAAACGA CGGCCAGTTC AGTTGAAATG TCAGAAGTG 39 



(2) INFORMATION FOR SEQ ID NO: 115: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B ) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ix) FEATURE: . 

(A) NAME /KEY: miscJEeature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: 
TGTAAAACGA CGGCCAGT 18 

(2) INFORMATION FOR SEQ ID NO: 116: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 23 base pairs 
{ B) TYPE: nucleic acid 

( C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY : misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION : /note- "primers directed to genomic 
. intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 
CCGGCTGGAA ATTTTATTTG GAG 23 
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(2) INFORMATION FOR SEQ ID NO: 117 i 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 
(ix) FEATURE s 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 
TGTAAAACGA CGGCCAGTAG GCACTGGAGA AATGGGATTT G 41 



(2) INFORMATION FOR SEQ ID NO: 118: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) .OTHER INFORMATION: /note= "primers directed. to genomic 
intron DNA " 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 
TCCAGC: ?AC ATGCATGTAC CGAAAT 26 

(2) INFORMATION FOR SEQ ID NO: 119 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primer directed to genomic 
intron DNA H 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: 
GTAGTCTGTG ATCTCCGTTT 20 
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(2) INFORMATION FOR SEQ ID NO: 120: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY 1 : linear 
( ix ) FEATURE : 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA W 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120 : 
TGTAAAACGA CGGCCAGTTA TGAGGTCCTG TCCTAG 36 



(2) INFORMATION FOR SEQ ID NO: 121: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: . 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA M 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:121: 
ACCAGTGTAT GTTGGGATG ~~ 19 

(2) INFORMATION FOR SEQ ID NO: 122: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /note= "primers directed to genomic 
intron DNA H 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 
TGTAAAACGA CGGCCAGTGA AAGAAGAACA CATCCCACA 39 
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(2) INFORMATION FOR SEQ ID NO: 123: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 770 amino acids 

(B) TYPE i amino acid 

<C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123: 

Met Ser Leu Arg He Lys Ala Leu Asp Ala Ser Val Val Asn Lys He 

15 10 15 

Ala Ala Gly Glu He He He Ser Pro Val Asn Ala Leu Lys Glu Met 

20 25 30 

Met Glu Asn Ser He Asp Ala Asn Ala Thr Met He Asp He Leu Val 

35 40 45 

Lys Glu Gly Gly He Lys Val Leu Gin He Thr Asp Asn Gly Ser Gly 

50 55 60 

lie Asn Lys Ala Asp Leu Pro He Leu Cys Glu Arg Phe Thr Thr Ser 
65 70 75 80 

Lys Leu Gin Lys Phe Glu Asp Leu Ser Gin lie Gin Thr Tyr Gly Phe 

85 90 95 

Arg Gly Glu hla. Leu Ala Ser He Ser His Val Ala ;.rg Val Thr Val 

100 105 110 

Thr Thr Lys Val Lys Glu Asp Arg Cys Ala Trp Arg Val Ser Tyr Ala 

115 120 125 

Glu Gly Lys Met Leu Glu Ser Pro Lys Pro Val Ala Gly Lys Asp Gly 

130 135 140 

Thr Thr lie Leu Val \ iu Asp Leu Phe Phe Asn He Pro Ser Arg Leu 
145 150 155 160 

Arg Ala Leu Arg Ser His Asn Asp Glu Tyr Ser Lys He Leu Asp Val 

165 170 175 

Val Gly Arg Tyr Ala He His Ser Lys Asp He Gly Phe Ser Cys Lys 

180 185 190 

Lys Phe Gly Asp Ser Asn Tyr Ser Leu Ser Val Lys Pro Ser Tyr Thr 

195 200 205 

Val Gin Asp Arg lie Arg Thr Val Phe Asn Lys Ser Val Ala Ser Asn 

210 215 220 

Leu lie Thr Phe His lie Ser Lys Val Glu Asp Leu Asn Leu Glu Ser 
225 230 235 240 

Val Asp Gly Lys Val Cys Asn Leu Asn Phe lie Ser Lys Lys Ser lie 

245 250 255 

Ser Leu lie Phe Phe lie Asn Asn Arg Leu Val Thr Cys Asp Leu Leu 

260 265 270 

Arg Arg Ala Leu Asn Ser Val Tyr Ser Asn Tyr Leu Pro Lys Gly Phe 
275 280 285 
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Arg Pro Phe He Tyr Leu Gly He Val He Asp Pro Ala Ala Val Asp 

290 295 300 

Val Asn Val Hia Pro Thr Lys Arg Glu Val Arg Phe Leu Ser Gin Asp 
305 310 315 320 

Glu He He Glu Lys He Ala Asn Gin Leu His Ala Glu Leu Ser Ala 

325 330 335 

He Asp Thr Ser Arg Thr Phe Lys Ala Ser Ser He Ser Thr Asn Lys 

340 345 350 

Pro Glu Ser Leu He Pro Phe Asn Asp Thr He Glu Ser Asp Arg Asn 

355 360 365 

Arg Lys Ser Leu Arg Gin Ala Gin Val Val Glu Asn Ser Tyr Thr Thr 

370 375 380 

Ala Asn Ser Gin Leu Arg Lys Ala Lys Arg Gin Glu Asn Lys Leu Val 
385 390 395 400 

Arg He Asp Ala Ser Gin Ala Lys He Thr Ser Phe Leu Ser Ser Ser 

405 410 415 

Gin Gin Phe Asn Phe Glu Gly Ser Ser Thr Lys Arg Gin Leu Ser Glu 

420 425 430 

Pro Lys Val Thr Asn Val Ser His Ser Gin Glu Ala Glu Lys Leu Thr 

435 440 445 

Leu Asn Glu Ser Glu Gin Pro Arg Asp Ala Asn Thr He Asn Asp Asn 

450 455 460 

Asp Leu Lys Asp Gin Pro Lys Lys' Lys Gin Lys Gin Leu Gly Asp Tyr 
465 470 475. 480 

Lys Val Pro Ser He Ala Asp Asp Glu Lys Asn Ala Leu Pro He Ser 

485 490 495 

Lys Asp Gly Tyr He Arg Val Pro Lys Glu Arg Val Asn Val Asn Leu 

* 500 505 510 

Thr Ser He Lys Lys Leu Arg Glu Lys Val Asp Asp Ser lie His Arg 

515 520 525 

Glu Leu Thr Asp He Phe Ala Asn Leu Asn Tyr Val Gly Val Val Asp 

530 535 540 

Glu Glu Arg Arg Leu Ala Ala He Gin His Asp Leu Lys Leu Phe Leu 
545 550 555 560 

He Asp Tyr Gly Ser Val Cys Tyr Glu Leu Phe Tyr Gin He Gly Leu 

565 570 575 

Thr Asp Phe Ala Asn Phe Gly Lys He Asn Leu Gin Ser Thr Asn Val 

580 585 590 

Ser Asp Asp He Val Leu Tyr Asn Leu Leu Ser Glu Phe Asp Glu Leu 

595 600 605 

Asn Asp Asp Ala Ser Lys Glu Lys He He Ser Lys He Trp Asp Met 

610 615 620 

Ser Ser Met Leu Asn Glu Tyr Tyr Ser He Glu Leu Val Asn Asp Gly 
625 630 635 640 
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Leu Asp Asn Asp Leu Lys Ser Val Lys Leu Lys Ser Leu Pro Leu Leu 

645 650 655 

Leu Lys GXy Tyr lie Pro Ser Leu Val Lys Leu Pro Phe Phe lie Tyr 

660 665 670 

Arg Leu Gly Lys Glu Val Asp Trp Glu Asp Glu Gin Glu Cys Leu Asp 

675 680 685 

Gly lie Leu Arg Glu lie Ala Leu Leu Tyr lie Pro Asp Met Val Pro 

690 695 700 

Lys Val Asp Thr Leu Asp Ala Ser Leu Ser Glu Asp Glu Lys Ala Gin 
705 710 715 720 

Phe lie Asn Arg Lys Glu His He Ser Ser Leu Leu Glu His Val Leu 

725 730 735 

Phe Pro Cys He Lys Arg Arg Phe Leu Ala Pro Arg His He Leu Lys 

740 745 750 

Asp Val Val Glu He Ala Asn Leu Pro Asp Leu Tyr Lys Val Phe Glu 
755 760 765 

Arg Cys 
770 

(2) INFORMATION FOR SKQ ID NO; 124; 
(i) SEQUENCE CHARACTERISTICS; 
' (A) LENGTH: 64 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 124: 

Val Asn Arg He Ala Ala Gly Glu Val He «ln Arg Pro Ala Asn Ala 

1 5 10 15 

He Lys Glu Met He Glu Asn Cys Leu Asp Ala Lys Phe Thr Ser He 

20 25 30 

Gin Val He Val Lys Glu Gly Gly Leu Lys Leu He Gin He Gin Asp 

35 40 45 

Asn Gly Thr Gly He Arg Lys Glu Asp Leu Asp He Val Cys Glu Arg 
50 55 60 



(2) INFORMATION FOR SEQ ID NO: 125: 
{i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 125 : 

Val Aan Arg He Ala Ala Gly Glu Val He Gin Arg Pro Ala Asn Ala 

15 10 15 

He Lys Glu Met He Glu Asn Cys Leu Asp Ala Lys Ser Thr Ser He 

20 25 30 

Gin Val He Val Lys Glu Gly Gly Leu Lys Leu He Gin He Gin Asp 

35 40 45 

Asn Gly Thr Gly He Arg Lys Glu Asp Leu Asp He Val Cyn Glu Arg 
50 55 60 



(2) INFORMATION FOR SEQ ID NO: 12 6: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126: 

Pro Ala Asn Ala He Lys Glu Met He Glu Asn Cys Leu Asp Ala Lys 

15 10 15 

Ser Thr Asn lie Gin Val Val Val Lys Glu Gly Gly Leu Lys Leu He 

20 25 * 30 

Gin He. Gin Asp Asn Gly Thr Gly He Arg Lys Glu Asp Leu Asp He 

35 40 45 , 

Val Cys Glu Arg 
50 

(2) INFORMATION FOR SEQ ID NO: 127: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 64 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: 

Val Asn Lys He Ala Ala Gly Glu He He He Ser Pro Val Asn Ala 

1 5 10 15 

Leu Lys Glu Met Met Glu Asn Ser He Asp Ala Asn Ala Thr Met He 

20 25 30 

Asp He Leu Val Lys Glu Gly Gly He Lys Val Leu Gin He Thr Asp 

35 40 45 

Asn Gly Ser Gly He Asn Lys Ala Asp Leu Pro He Leu Cys Glu Arg 
50 55 60 
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(2) INFORMATION FOR SEQ ID NO: 128: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 amino acids 

(B) TYPE: amino acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128: 

Val His Arg lie Thr Ser Gly Gin Val lie Thr Asp Leu Thr Thr Ala 

15 10 15 

Val Lys Glu Leu Val Asp Asn Ser lie Asp Ala Asn Ala Asn Gin lie 

20 25 30 

Glu lie lie Phe Lys Asp Tyr Gly Leu Glu Ser lie Glu Cys Ser Asp 

35 40 45 

Asn Gly Asp Gly lie Asp Pro Ser Asn Tyr Glu Phe Leu Ala Leu Lys 
50 55 60 

(2) INFORMATION FOR SEQ ID NO: 129: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 amino acids 

(B) TYPE: amino acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 129: 

Ala Asn Gin lie Ala Ala Gly Glu Val Val Glu Arg Pro Ala Ser XT al 

1.5 10 15 

Val Lys Glu Leu Val Glu Asn Ser Leu Asp Ala Gly Ala Thr Arg lie 

20 25 30 

Asp lie Asp lie Glu Arg Gly Gly Ala Lys Leu lie Arg lie Arg Asp 

35 40 45 

Asn Gly Cys Gly lie Lys Lys Asp Glu Leu Ala Leu Ala Leu Ala Arg 
50 55 60 



(2) INFORMATION FOR SEQ ID NO: 130: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNES5 : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 130: 

Ala Asn Gin lie Ala Ala Gly Glu Val Val Glu Arg Pro Ala Ser Val 
15 10 15 
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Val Lys Glu Leu Val Glu Asn Ser 
20 

Asp lie Asp lie Glu Arg Gly Gly 

35 40 
Asn Gly Cys Gly lie Lys Lys Glu 
50 55 



Leu Asp Ala Gly Ala Thr Arg Val 
25 30 
Ala Lys Leu lie Arg lie Arg Asp 
45 

Glu Leu Ala Leu Ala Leu Ala Arg 
60 



(2) INFORMATION FOR SEQ ID NO: 131: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 amino acids 
(33) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 

Ala Asn Gin lie Ala Ala Gly Glu Val lie Glu Arg Pro Ala Ser Val 

15 10 15 

Cys Lys Glu Leu Val Glu Asn Ala lie Asp Ala Gly Ser Ser Gin lie 

20 25 30 

lie He Glu He Glu Glu Ala Gly Leu Lys Lys Val Gin He Thr Asp 

35 40 45 

Asn Gly His Gly He Ala His Asp Glu Val Glu Leu Ala Leu Arg Arg 
50 55 60 

(2) INFORMATION FOR SEQ ID NO: 132: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2687 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(viii) POSITION IN GENOME: 

(B) MAP POSITION: 7q 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132: 

CCATGGAGCG AGCTGAGAGC TCGAGTACAG AACCTGCTAA GGCCATCAAA CCTATTGATC 60 

GGAAGTCAGT CCATCAGATT TGCTCTGGGC AGGTGGTACT GAGTCTAAGC ACTGCGGTAA 120 

AGGAGTTAGT AGAAAACAGT CTGGATGCTG GTGCCACTAA TATTGATCTA AAGCTTAAGG 180 

ACTATGGAGT GGATCTTATT GAAGTTTCAG ACAATGGATG TGGGGTAGAA GAAGAAAACT 240 

TCGAAGGCTT AACTCTGAAA CATCACACAT CTAAGATTCA AGAGTTTGCC GACCTAACTC 300 

AGGTTGAAAC TTTTGGCTTT CGGGGGGAAG CTCTGAGCTC ACTTTGTGCA CTGAGCGATG 360 

TCACCATTTC TACCTGCCAC GCATCGGCGA AGGTTGGAAC TCGACTGATG TTTGATCACA 420 

ATGGGAAAAT TATCCAGAAA ACCCCCTACC CCCGCCCCAG AGGGACCACA GTCAGCGTGC 480 

AGCAGTTATT TTCCACACTA CCTGTGCGCC ATAAGGAATT TCAAAGGAAT ATTAAGAAGG 540 

AGTATGCCAA AATGGTCCAG GTCTTACATG CATACTGTAT CATTTCAGCA GGCATCCGTG 600 

TAAGTTGCAC CAATCAGCTT GGACAAGGAA AACGACAGCC TGTGGTATGC ACAGGTGGAA 660 
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GCCCCAGCAT AAAGGAAAAT ATCGGCTCTG TGTTTGGGCA GAAGCAGTTG CAAAGCCTCA 720 
TTCCTTTTGT TCAGCTGCCC CCTAGTGACT CCGTGTGTGA AGAGTACGGT TTGAGCTGTT 780 
CGGATGCTCT GCATAATCTT TTTTACATCT CAGGTTTCAT TTCACAATGC ACGCATGGAG 840 
TTGGAAGGAG TTCAACAGAC AGACAGTTTT TCTTTATCAA CCGGCGGCCT TGTGACCCAG 900 
CAAAGGTCTG CAGACTCGTG AATGAGGTCT ACCACATGTA TAATCGACAC CAGTATCCAT 960 
TTGTTGTTCT TAACATTTCT GTTGATTCAG AATGCGTTGA TATCAATGTT ACTCCAGATA 1020 
AAAGGCAAAT TTTGCTACAA GAGGAAAAGC TTTTGTTGGC AGTTTTAAAG ACCTCTTTGA 1080 
TAGGAATGTT TGATAGTGAT GTCAACAAGC TAAATGTCAG TCAGCAGCCA CTGCTGGATG 1140 
TTGAAGGTAA CTTAATAAAA ATGCATGCAG CGGATTTGGA AAAGCCCATG GTAGAAAAGC 1200 
AGGATCAATC CCCTTCATTA AGGACTGGAG AAGAAAAAAA AGACGTGTCC ATTTCCAGAC 1260 
TGCGAGAGGC CTTTTCTCTT CGTCACACAA CAGAGAACAA GCCTCACAGC CCAAAGACTC 1320 
CAGAACCAAG AAGGAGCCCT CTAGGACAGA AAAGGGGTAT GCTGTCTTCT AGCACTTCAG 1380 
GTGCCATCTC TGACAAAGGC GTCCTGAGAT CTCAGAAAGA GGCAGTGAGT TCCAGTCACG 1440 
GACCCAGTGA CCCTACGGAC AGAGCGGAGG TGGAGAAGGA CTCGGGGCAC GGCAGCACTT 1500 
CCGTGGATTC TGAGGGGTTC AGCATCCCAG ACACGGGCAG TCACTGCAGC AGCGAGTATG 1560 
CGGCCAGCTC CCCAGGGGAC AGGGGCTCGC AGGAACATGT GGACTCTCAG GAGAAAGCGC 1620 
CTGAAACTGA CGACTCTTTT TCAGATGTGG ACTGCCATTC AAACCAGGAA GATACCGGAT 1680 
GTAAATTTCG AGTTTTGCCT CAGCCAACTA ATCTCGCAAC CCCAAACACA AAGCGTTTTA 1740 
AAAAAGAAGA AATTCTTTCC AGTTCTGACA TTTGTCAAAA GTTAGTAAAT ACTCAGGACA 1800 
TGTCAGCCTC TCAGGTTGAT TGAGCTGTGA AAATTAATAA GAAAGTTGTG CCCCTGGACT 1860 
TTTCTATGAG TTCTTTAGCT AAACGAATAA AGCAGTTACA TCATGAAGCA CAGCAAAGTG 1920 
AAGGGGAACA GAATTACAGG AAGTTTAGGG CAAAGATTTG TCCTGGAGAA AATCAAGCAG 1980 
CCGAAGATGA ACTAAGAAAA GAGATAAGTA AAACGATGTT TGCAGAAATG GAAATCATTG 2040 
GTCAGTTTAA CCTGGGATTT ATAATAACCA AACTGAATGA GGATATCTTC ATAGTGGACC * 2100 
AGCATGCCAC GGACGAGAAG TATAACTTCG AGATGCTGCA GCAGCACACC GTGCTCCAGG 2160 
GGCAGAGGCT CATAGCACCT CAGACTCTCA ACTTAACTGC TGTTAATGAA GCTGTTCTGA 2220 
TAGAAAATCT GGAAATATTT AGAAAGAATG GCTTTGATTT TGTTATCGAT GAAAATGCTC 2280 
CAGTCACTGA AAGGGCTAAA CTGATTTCCT TGCCAACTAG TAAAAACTGG ACCTTCGGAC 2340 
CCCAGGACGT CGATGAACTG ATCTTCATGC TGAGCGACAG CCCTGGGGTC ATGTGCCGCC 2400 
CTTCCCGAGT CAAGCAGATG TTTGCCTCCA GAGCCTGCCG GAAGTCGGTG ATGATTGGGA 2460 
CTGCTCTCAA CACAAGCGAA TGAAGAAACT GATCACCCAC ATGGGGGAGA TGGGCCACCC 2520 
CTGGAACTGT CCCCATGGAA GGCCACCATG AGACACATCG CCAACCTGGG TGTCATTTCT 2580 
CAGAACTGAC CGTAGTCACT GTATGGAATA ATTGGTTTTA TCGCAGATTT TTATGTTTTG 2 640 
AAAGACAGAG TCTTCACTAA CCTTTTTTGT TTTAAAATGA AACCTGC 2687 



(2) INFORMATION FOR SEQ ID NO: 133: 
<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 862 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:13<3: 

Met Glu Arg Ala Glu Ser Ser Ser Thr Glu Pro Ala Lys Ala He Lys 
15 io 15 
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Pro lie Asp Arg Lys Ser Val His Gin lie Cys Ser Gly Gin Val Val 

20 25 30 

Leu Ser Leu Ser Thr Ala Val Lys Glu Leu Val Glu Asn Ser Leu Asp 

35 40 45 

Ala Gly Ala Thr Asn lie Asp Leu Lys Leu Lys Asp Tyr Gly Val Asp 

50 55 60 

Leu He Glu Val Ser Asp Asn Gly Cys Gly Val Glu Glu Glu Asn Phe 
65 70 75 80 

Glu Gly Leu Thr Leu Lys His His Thr Ser Lys He Gin Glu Phe Ala 

85 90 95 

Asp Leu Thr Gin Val Glu Thr Phe Gly Phe Arg Gly Glu Ala Leu Ser 

100 105 110 

Ser Leu Cys Ala Leu Ser Asp Val Thr He Ser Thr Cys His Ala ser 

115 120 125 

Ala Lys Val Gly Thr Arg Leu Met Phe Asp His Asn Gly Lys He He 

130 135 140 

Gin Lys Thr Pro Tyr Pro Arg Pro Arg Gly Thr Thr Val Ser Val Gin 
145 150 155 160 

Gin Leu Phe Ser Thr Leu Pro Val Arg His Lys Glu Phe Gin Arg Asn 

165 170 175 

He- Lys Lys Glu Tyr Ala Lys Met Val Gin Val Leu His Ala Tyr Cys 

180 185 190 

He lie Ser Ala Gly He Arg Val Ser Cys Thr Asn Gin Leu- Gly Gin 

195 200 205 

Gly Lys Arg Gin Pro Val Val Cys He Gly Gly Ser Pro Ser He Lys 

210 215 220 

Glu Asn He Gly Ser Val Phe Gly Gin Lys Gin Leu Gin Ser Leu He 
225 230 235 240 

Pro Phe Val Gin Leu Pro Pro Ser Asp Ser Val Cys Glu Glu Tyr Gly 

245 250 255 

Leu Ser Cys Ser Asp Ala Leu His Asn Leu Phe Tyr He Ser Gly Phe 

260 265 270 

He Ser Gin Cys Thr His Gly Val Gly Arg ser Ser Thr Asp Arg Gin 

275 280 285 

Phe Phe Phe He Asn Arg Arg Pro Cys Asp Pro Ala Lys Val Cys Arg 

290 295 300 

Leu Val Asn Glu Val Tyr His Met Tyr Asn Arg His Gin Tyr Pro Phe 
305 310 315 320 

Val Val Leu Asn He Ser Val Asp Ser Glu Cys Val Asp He Asn Val 

325 330 335 

Thr Pro Asp Lys Arg Gin He Leu Leu Gin Glu Glu Lys Leu Leu Leu 

340 345 350 

Ala Val Leu Lys Thr Ser Leu He Gly Met Phe Asp Ser Asp Val Asn 
355 360 365 
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Lys Leu Asn Val Ser Gin Gin Pro Leu Leu Asp Val Glu Gly Asn Leu 

370 375 380 

lie Lys Met His Ala Ala Asp Leu Glu Lys Pro Met Val Glu His Gin 
385 390 395 400 

Asp Gin Ser Pro Ser Leu Arg He Gly Glu Glu Lys Lys Asp Val Ser 

405 410 415 

He Ser Arg Leu Arg Glu Ala Phe Ser Leu Arg His Thr Thr Glu Asn 

420 425 430 

Lys Pro His Ser Pro Lys Thr Pro Glu Pro Arg Arg Ser Pro Leu Gly 

435 440 445 

Gin Lys Arg Gly Met Leu Ser Ser Ser Thr Ser Gly Ala He Ser Asp 

450 455 460 

Lys Gly Val Leu Arg Ser Gin Lys Glu Ala Val Ser Ser Ser His Gly 
465 470 475 480 

Pro Ser Asp Pro Thr Asp Arg Ala Glu Val Glu Lys Asp Ser Gly His 

485 490 495 

Gly Ser Thr Ser Val Asp Ser Glu Gly Phe Ser He Pro Asp Thr Gly 

500 505 510 

Ser His Cys Ser Ser Glu Tyr Ala Ala Ser Ser Pro Gly Asp Arg Gly 

515 520 525 

Ser Gin Glu His Val Asp Ser Gin Glu Lys Ala Pro Glu Thr Asp Asp 

530 535 540 

Ser Phe Ser Asp Val Asp Cys His Ser Asn Gin Glu Asp Thr Gly Cys 
545 550 555 560 

Lys Phe Arg Val Leu Pro Gin Pro He Asn Leu Ala Thr Pro Asn Thr 

565 570 575 

Lys Arg Phe Lys Lys Glu Glu He Leu Ser Ser Ser Asp He Cys Gin 

580 585 590 

Lys Leu Val Asn Thr Gin Asp Met Ser Ala Ser Gin Val Asp Val Ala 

595 600 605 

Val Lys He Asn Lys Lys Val Val Pro Leu Asp Phe Ser Met Ser Ser 

610 615 620 

Leu Ala Lys Arg He Lys Gin Leu His His Glu Ala Gin Gin Ser Glu 
625 630 635 640 

Gly Glu Gin Asn Tyr Arg Lys Phe Arg Ala Lys He Cys Pro Gly Glu 

645 650 655 

Asn Gin Ala Ala Glu Asp Glu Leu Arg Lys Glu He Ser Lys Thr Met 

660 665 670 

Phe Ala Glu Met Glu He He Gly Gin Phe Asn Leu Gly Phe He He 

675 680 685 

Thr Lys Leu Asn Glu Asp He Phe lie Val Asp Gin His Ala Thr Asp 

690 695 700 

Glu Lys Tyr Asn Phe Glu Met Leu Gin Gin His Thr Val Leu Gin Gly 
705 710 715 720 
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Gin Arg Leu lie Ala Pro Gin Thr Leu Asn Leu Thr Ala Val Asn Glu 

725 730 735 

Ala Val Leu He Glu Asn Leu Glu He Phe Arg Lys Asn Gly Phe Asp 

740 745 750 

Phe Val He Asp Glu Asn Ala Pro Val Thr Glu Arg Ala Lys Leu He 

755 760 765 

Ser Leu Pro Thr Ser Lys Asn Trp Thr Phe Gly Pro Gin Asp Val Asp 

770 775 780 

Glu Leu He Phe Met Leu Ser Asp Ser Pro Gly Val Met Cys Arg Pro 
785 790 795 800 

Ser Arg Val Lys Gin Met Phe Ala Ser Arg Ala Cys Arg Lys Ser Val 

805 810 815 

Met He Gly Thr Ala Leu Asn Thr Ser Glu Met Lys Lys Leu He Thr 

820 825 830 

His Met Gly Glu Met Gly His Pro Trp Asn Cys Pro His Gly Arg Pro 

835 840 845 

Thr Met Arg His He Ala Asn Leu Gly Val He Ser Gin Asn 
850 855 860 

(2) INFORMATION FOR SEQ ID NO: 134: 
(i) SEQUENCE .CHARACTERISTICS: 

(A) LENGTH: 903 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134: 

Met Phe His His He Glu Asn Leu Leu He Glu Thr Glu Lys Arg Cys 

1 5 10 15 

Lys Gin Lys Glu Gin Arg Tyr He Pro Val Lys Tyr Leu Phe Ser Met 

20 25 30 

Thr Gin He His Gin He Asn Asp He Asp Val His Arg He Thr Ser 

35 40 45 

Gly Gin Val He Thr Asp Leu Thr Thr Ala Val Lys Glu Leu Val Asp 

50 55 60 

Asn Ser He Asp Ala Asn Ala Asn Gin He Glu He He Phe Lys Asp 
65 70 75 80 

Tyr Gly Leu Glu Ser He Glu Cys Ser Asp Asn Gly Asp Gly He Asp 

85 90 95 

Pro Ser Asn Tyr Glu Phe Leu Ala Leu Lys His Tyr Thr Ser Lys He 

100 105 110 

Ala Lys Phe Gin Asp Val Ala Lys Val Gin Thr Leu Gly Phe Arg Gly 

115 120 125 

Glu Ala Leu Ser Ser Leu Cys Gly He Ala Lys Leu Ser Val He Thr 
130 135 140 
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Thr Thr Ser Pro Pro Lys Ala Asp Lys Leu Glu Tyr Asp Met Val Gly 
145 150 155 160 

His He Thr Ser Lys Thr Thr Ser Arg Asn Lys Gly Thr Thr Val Leu 

165 170 175 

Val Ser Gin Leu Phe His Asn Leu Pro Val Arg Gin Lys Glu Phe Ser 

180 185 190 

Lys Thr Phe Lys Arg Gin Phe Thr Lys Cys Leu Thr Val He Gin Gly 

195 200 205 

Tyr Ala lie He Asn Ala Ala He Lys Phe Ser Val Trp Asn He Thr 

210 215 220 

Pro Lys Gly Lys Lys Asn Leu He Leu Ser Thr Met Arg Asn Ser Ser 
225 230 235 240 

Met Arg Lys Asn lie Ser Ser Val Phe Gly Ala Gly Gly Met Phe Gly 

245 250 255 

Leu Glu Glu Val Asp Leu Val Leu Asp Leu Asn Pro Phe Lys Asn Arg 

260 265 270 

Met Leu Gly Lys Tyr Thr Asp Asp Pro Asp Phe Leu Asp Leu Asp Tyr 

275 280 - 285 

Lys He Arg Val Lys Gly Tyr He Ser Gin Asn Ser Phe Gly Cys Gly 

290 295 300 

Arg Asn Ser Lys Asp Arg Gin Phe He Tyr Val Asn Lys Arg Pro Val 
305 310 315 320 

Glu Tyr Ser Thr Leu Leu Lys Cys Cys Asn Glu Val Tyr Lys Thr Phe 
^ 325 330 335 

Asn Asn Val Gin Phe Pro Ala Val Phe Leu Asn Leu Glu Leu Pro Met 

340 345 350 

s*.r Leu He Asp Val Asn Val Thr Pro Asp Lys Arg Val He Leu Leu 

355 360 365 

His Asn Glu Arg Ala Val He Asp He Phe Lys Thr Thr Leu Ser Asp 

370 375 380 

Tyr Tyr Asn Arg Gin Glu Leu Ala Leu Pro Lys Arg Met Cys Ser Gin 
385 390 395 400 

Ser Glu Gin Gin Ala Gin Lys Arg Leu Lys Thr Glu Val Phe Asp Asp 

405 410 415 

Arg Ser Thr Thr His Glu Ser Asp Asn Glu Asn Tyr His Thr Ala Arg 

420 425 430 

Ser Glu Ser Asn Gin Ser Asn His Ala His Phe Asn Ser Thr Thr Gly 

435 440 445 

Val He Asp Lys Ser Asn Gly Thr Glu Leu Thr Ser Val Met Asp Gly 

450 455 460 

Asn Tyr Thr Asn Val Thr Asp Val He Gly Ser Glu Cys Glu Val Ser 
465 470 475 480 

Val Asp Ser Ser Val Val Leu Asp Glu Gly Asn Ser Ser Thr Pro Thr 
485 490 495 
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Lys Lys Leu Pro Ser lie Lys Thr Asp Ser Gin Asn Leu Ser Asp Leu 

500 505 510 

Asn Leu Asn Asn Phe Ser Asn Pro Glu Phe Gin Asn He Thr Ser Pro 

515 520 525 

Asp Lys Ala Arg Ser Leu Glu Lys Val Val Glu Glu Pro Val Tyr Phe 

530 535 540 

Asp He Asp Gly Glu Lys Phe Gin Glu Lys Ala Val Leu Ser Gin Ala 
545 550 555 560 

Asp Gly Leu Val Phe Val Asp Asn Glu Cys His Glu His Thr Asn Asp 

565 570 575 

Cys Cys His Gin Glu Arg Arg Gly Ser Thr Asp lie Glu Gin Asp Asp 

580 585 590 

Glu Ala Asp ser He Tyr Ala Glu He Glu Pro Val Glu He Asn Val 

595 600 605 

Arg Thr Pro Leu Lys Asn Ser Arg Lys Ser He Ser Lys Asp Asn Tyr 

610 615 620 

Arg Ser Leu Ser Asp Gly Leu Thr His Arg Lys Phe Glu Asp Glu He 
625 630 635 640 

Leu Glu Tyr Asn Leu Ser Thr Lys Asn Phe Lys Glu He Ser Lys Asn 

645 650 655 

Gly Lys Gin* Met Ser Ser He He Ser Lys Arg Lys Ser Glu Ala Gin 

660 665 670 

Glu Asn He He Lys Asn Lys Asp Glu Leu Glu Asp Phe Glu Gin Gly 

675 680 685 

Glu Lys Tyr Leu Thr Leu Thr Val Ser Lys Asn Asp Phe Lys Lys Met 

690 695 700 

Glu Val Val Gly Gin Phe Asn Leu Gly Phe He He Val Thr Arg Lys 
705 710 715 720 

Val Asp Asn Lys Ser Lys Leu Phe He Val Asp Gin His Ala Ser Asp 

725 730 735 

Glu Lys Tyr Asn Phe Glu Thr Leu Gin Ala Val Thr Val Phe Lys Ser 

740 745 750 

Gin Lys Leu He He Pro Gin Pro Val Glu Leu Ser Val He Asp Glu 

755 760 765 

Leu Val Val Leu Asp Asn Leu Pro Val Phe Glu Lys Asn Gly Phe Lys 

770 775 780 

Leu Lys He Asp Glu Glu Glu Glu Phe Gly Ser Arg Val Lys Leu Leu 
785 790 795 800 

Ser Leu Pro Thr Ser Lys Gin Thr Leu Phe Asp Leu Gly Asp Phe Asn 

805 810 815 

Glu Leu He His Leu He Lys Glu Asp Gly Gly Leu Arg Arg Asp Asn 

820 825 830 

He Arg Cys Ser Lys He Arg Ser Met Phe Ala Met Arg Ala Cys Arg 
835 840 845 
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Ser Ser lie Met lie Gly Lys Pro Leu Asn Lys Lys Thr Met Thr Arg 

850 855 860 

val Val His Asn Leu Ser Glu Leu Asp Lys Pro Trp Asn Cys Pro His 
865 870 875 880 

Gly Arg Pro Thr Met Arg His Leu Met Glu He Arg Asp Trp Ser Ser 

885 890 895 

Phe Ser Lys Asp Tyr Glu He 
900 



(2) INFORMATION FOR SEQ ID NO: 135: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2577 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 135: 



TTCCGGCCAA 


TGCTATCAAA 


GAGATGATAG 


AAAACTGTTT 


AGATGCAAAA 


TCTACAAATA 


bU 


m fn 7\ Ti f** rri ^ m 

TTCAAGTGGT 


TGTTAAGGAA 


GGTGGCCTGA 


AGCTAATTCA 


G AT CC AAG AC 


AATGGOAC I C» 




GAATCAGGAA 


GGAAGATCTG 


GATATTGTGT 


GTGAGAGGTT 


CACTACGAGT 


AAACTGCAGA 


180 


CTTTTGAGGA 


TTTAGCCAGT 


ATTTCTACCT 


ATGGCTTTCG 


TGGTGAGCAT 


TTGGCAAGCA 


240 


TAAGTCATGT 


GGCCCATGTC 


ACTATTACAA 


CCAAAACAGC 


TGATGGGAAA 


TGTGCGTACA 


300 


GAGCAAGTTA 


CTCAGATGGA 


AAGCTGCAAG 


CCCCTCCTAA 


ACCCTGTGCA 


GGCAACCAGG 


360 


GCACCCTGAT 


CACGGTGGAA 


GACCTTTTTT 


ACAACATAAT 


CACAAGGAGG 


AAAGCTTTAA 


420 


AAAATCCAAG 


TGAAGAGTAC 


GGAAAAATTT 


TGGAAGTTGT 


TGGCAGGTAT 


TCAATACACA 


480 


ATTCAGGCAT 


TAGTATCTCA 


GTTAAAAAAC 


AAGGTGAGAC 


AGTATCTGAT 


GTCAGAACAC 


540 


TGCCCAATGC 


CACAACCGTG 


GACA?. r "uTTC 


GCTCCATCTT 


TGGAAATGCG 


GTTAGTCGAG 


600 


AACTGATAGA 


AGTTGGGTGT 


GAG' 'AAAA 


CCCTAGCTTT 


CAAAATGAAT 


GGCTATATAT 


660 


CGAATGCAAA 


GTATTCAGTG 


AA .GTGCA 


TTTTCCTACT 


CTTCATCAAC 


CACCGTCTGG 


720 


TAGAATCAGC 


TGCCTTGAGA 


AAAGCCATTG 


AAACTGTATA 


TGCAGCATAC 


TTGCCAAAAA 


780 


CACACACCCA 


TTCCTGTACC 


TCAGTTTGAA 


ATCAGCCCTC 


AGAACGTGAC 


GTCAATGTAC 


840 


ACCCCACCAA 


GACAGAAGTT 


CATTTTCTGC 


ACGAGGAGAG 


CATTCTGCAG 


CGTGTGCAGC 


900 


AGCACATTGA 


GAGCAAGCTG 


CTGGGCTCCA 


ATTCCTCCAG 


GATGTATTTC 


ACCCAGACCT 


960 


TGCTTCCAGG 


ACTTGCTGGG 


CCTCTGGGGA 


GGCAGCTAGA 


CCCACGACAG 


GGGTGGCTTC 


1020 


CTCATCCACT 


AGTGGAAGTG 


GCGACAAGGT 


CTACGCTTAC 


CAGATGTCGC 


GTACGGACTC 


1080 


CCGGGATCAG 


AAGCTTGACG 


CCTTTCTGCA 


GCCTGTAACC 


AGCCTTGTGC 


CCAGCCAGCC 


1140 


CCAGGACCCT 


CGCCCTGTCC 


GAGGGGCCAG 


GACAGAGGGC 


TCTCCTGAAA 


GGGCCACGCG 


1200 


GGAGGATGAG 


GAGATGCTTG 


CTCTCCCAGC 


CCCCGCTGAA 


GCAGCTGCTG 


AGAGTGAGAA 


1260 


CTTGGAGAGG 


GAATCACTAA 


TGGAGACTTC 


AGACG CAGCC 


CAGAAAGCGG 


CACCCACTTC 


1320 


CAGTCCAGGA 


AGCTCCAGAA 


AGAGTCATCG 


GGAGGACTCT 


GATGTGGAAA 


TGGTGGAAAA 


1380 


TGCTTCCGGG 


AAGGAAATGA 


CAGCTGCTTG 


CTACCCCAGG 


AGGAGGATCA 


TTAACCTCAC 


1440 


CAGCGTCTTG 


AGTCTCCAGG 


AAGAGATTAG 


TGAGCGGTGC 


CATGAGACTC 


TCCGGGAGAT 


1500 


ACTCCGTAAC 


CATTCCTTTG 


TGGGCTGTGT 


GAATCCTCAG 


TGGGCCTTGG 


CACAGCACCA 


1560 


GACCAAGCTA 


TACCTCCTCA 


ACACTACCAA 


GCTCAGTGAA 


GAGCTGTTCT 


ACCAGATACT 


1620 


CATTTATGAT 


TTTGCCAACT 


TTGGTGTTCT 


GAGGTTATCG 


GAACCAGCGC 


CACTCTTCGA 


1680 


CCTGGCCATG 


CTGGCTTAGA 


CAGTCCTGAA 


AGTGGCTGGA 


CAGAGGACGA 


CGGCCCGAAG 


1740 
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AAGGGCTTGC AGAGTACATT GTCGAGTTTC TGAAGAGAAG CGAGATGCTT GCAGACTATT 1800 

CTCTGTGAGA TCGATGAGAA GGGAACCTGA TTGATTACTC TTCTGATGAC AGCTATGTGC 1860 

CACCTTTGGA GGGACTGCCT ATCTTCATTC TTCGACTGGC CACTGAGGTG AATTGGGTGA 1920 

AGAAAAGGAG TGTTTTGAAA GTCTCAGTAA AGAATGTGCT ATGTTTTACT CCATTCGGAA 1980 

GCAGTATATA CTGGAGGAGT CGACCCTCTC AGGCCAGCAG AGTGACATGC CTGGCTCCAC 2040 

GTCAAAGCCC TGGAAGTGGA CTGTGGAGCA CATTATCTAT AAAGCCTTCC GCTCACACCT 2100 

CCTACCTCCG AAGCATTTCA CAGAAGATGG CAATGTCCTG CAGCTTGCCA ACCTGCCAGA 2160 

TCTATACAAA GTCTTTGAGC GGTGTTAAAT AC AAT CAT AG CCACCGTAGA GACTGCATGA 2220 

CCATCCAAGG CGAAGTGTAT GGTACTAATC TGGAAGCCAC AGAATAGGAC ACTTGGTTTC 2280 

AGCTCCAGGG TTTTCAGTGC TCACTATTCT TGTTCTGTAT CCCAGTATTG GTGCTGCAAC 2340 

TTAATGTACT TCACCTGTGG ATTGGCTGCA AATAAACTCA CGTGTATTGG AAAAAAGGAA 2400 

TTCCTGCAGC CCGGGGGATC CACTAGTTCT AGAGCGGCCG CCACCGGTGG AGCTCCAGCT 2460 

TTTGTTCCCT TTAGTGAGGG TTAATTTCGA GCTTGGCGTA ATCATGGTCA TAGCTGTTTC 2520 

CTGTGTGAAA TTGTTATCCG CTCACAATTC CACACAACAT ACGAGCCGGA AGCATAA 2577 



(2) INFORMATION FOR SEQ ID NO: 136: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 728 amino acids 

(B) TYPE: amino acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:136: 
Pro Ala Asn Ala lie Lys Glu Met lie Glu Asn Cys Leu Asp Ala l*ye 
1' 5 10 15 

Ser Thr Asn He Gin Val Val Val Lys Glu Gly Gly Leu Lys Leu He 

20 25 30 

Gin He Gin Asp Asn Gly Thr Gly He Arg Lys Glu Asp Leu Asp He 

35 40 45 

Val Cys Glu Arg Phe Thr Thr Ser Lys Leu Gin Thr Phe Glu Asp Leu 

50 55 60 

Ala Ser He Ser Thr Tyr Gly Phe Arg Gly Glu His Leu Ala Ser He 
65 70 75 80 

Ser His Val Ala His Val Thr He Thr Thr Lys Thr Ala Asp Gly Lys 

85 90 95 

Cys Ala Tyr Arg Ala Ser Tyr Ser Asp Gly Lys Leu Gin Ala Pro Pro 

100 105 110 

Lys Pro Cys Ala Gly Asn Gin Gly Thr Leu He Thr Val Glu Asp Leu 

115 120 125 

Phe Tyr Asn He He Thr Arg Arg Lys Ala Leu Lys Asn Pro Ser Glu 

130 135 140 

Glu Tyr Gly Lys He Leu Glu Val Val Gly Arg Tyr Ser He His Asn 
145 150 155 160 

Ser Gly He Ser He Ser Val Lys Lys Gin Gly Glu Thr Val Ser Asp 
165 170 175 
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Val Arg Thr Leu Pro Asn Ala Thr Thr Val Asp Asn He Arg Ser He 

ISO 185 190 

Phe Gly Asn Ala Val Ser Arg Glu Leu He Glu Val Gly Cys Glu Asp 

195 200 205 

Lys Thr Leu Ala Phe Lys Met Asn Gly Tyr He Ser Asn Ala Lys Tyr 

210 215 220 

Ser Val Lys Lys Cys He Phe Leu Leu Phe He Asn His Arg Leu Val 
225 230 235 240 

Glu Ser Ala Ala Leu Arg Lys Ala He Glu Thr Val Tyr Ala Ala Tyr 

245 250 255 

Leu Pro Lys Thr His Thr His Ser Cys Thr Ser Val Glx Asn Gin Pro 

260 265 270 

Ser Glu Arg Asp Val Asn Val His Pro Thr Lys Thr Glu Val His Phe 

275 280 285 

Leu His Glu Glu Ser He Leu Gin Arg Val Gin Gin His He Glu Ser 

290 295 300 

Lys Leu Leu Gly ser Asn Ser Ser Arg Met Val Phe His Pro Asp Leu 
305 310 315 320 

Ala ser Arg Thr Cys Trp Ala ser Gly Glu Ala Ala Arg Pro Thr Thr 

325 330 335 

Gly Val Ala Ser Ser Ser Thr Ser Gly Ser Gly Asp Lys Val Tyr Ala 

340 345 350 

Tyr Gin Met Ser Arg Thr Asp Ser Arg Asp Gin Lys Leu Asp Ala Phe 

355 360 365 

Leu Gin Pro Val Ser Ser Leu Val Pro Ser Gin Pro Gin Asp Pro Arg 

370 375 380 

Pro Val Arg Gly Ala Arg Thr Glu Gly Ser P. . Glu Arg Ala Thr Arg 
385 390 Z<jS 400 

Glu Asp Glu Glu Met Leu Ala Leu Pro Ala Pro Ala Glu Ala Ala Ala 

405 410 415 

Glu Ser Glu Asn Leu Glu Arg Glu Ser Leu Met Glu Thr Ser Asp Ala 

420 425 430 

Ala Gin Lys Ala Ala Pro Thr Ser Ser Pro Gly Ser Ser Arg Lys Ser 

435 440 445 

His Arg Glu Asp Ser Asp Val Glu Met Val Glu Asn Ala Ser Gly Lys 

450 455 460 

Glu Met Thr Ala Ala Cys Tyr Pro Arg Arg Arg He He Asn Leu Thr 
465 470 475 480 

Ser Val Leu Ser Leu Gin Glu Glu He Ser Glu Arg Cys His Glu Thr 

485 490 495 

Leu Arg Glu lie Leu Arg Asn His Ser Phe Val Gly Cys Val Asn Pro 

500 505 510 

Gin Trp Ala Leu Ala Gin His Gin Thr Lys Leu Tyr Leu Leu Asn Thr 
515 520 525 
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Thr Lys Leu Ser Glu Glu Leu Phe Tyr Gin He Leu He Tyr Asp Phe 

530 535 540 

Ala Asn Phe Gly val Leu Arg Leu Ser Glu Pro Ala Pro Leu Phe Asp 
545 550 555 560 

Leu Ala Met Leu Ala Glx Thr Val Leu Lys Val Ala Gly Gin Arg Thr 

565 570 575 

Thr Ala Arg Arg Arg Ala Cys Arg Val His Cys Arg Val Ser Glu Glu 

580 585 590 

Lys Arg Asp Ala Cys Arg Leu Phe Ser Val Arg Ser Met Arg Arg Glu 

595 600 605 

Pro Asp Glx Leu Leu Phe Glx Glx Gin Leu Cys Ala Thr Phe Gly Gly 

610 615 620 

Thr Ala Tyr Leu His Ser Ser Thr Gly His Glx Gly Glu Leu Gly Glu 
625 630 635 640 

Glu Lys Glu Cys Phe Glu Ser Leu Ser Lys Glu Cys Ala Met Phe Tyr 

645 650 655 

Ser He Arg Lys Gin Tyr He Leu Glu Glu Ser Thr Leu Ser Gly Gin 

660 665 670 

Gin Ser Asp Met Pro Gly Ser Thr Ser Lys Pro Trp Lys Trp Thr Val 

675 680 685 

Glu His He He Tyr Lys Ala Phe Arg Ser His Leu Leu Pro Pro Lys 

690 695 700 

His Phe Thr Glu Asp Gly Asn Val Leu Gin Leu Ala Asn Leu Pro Asp 
705 710 715 720 

Leu Tyr Lys Val Phe Glu Arg Cys 
725 

(2) INFORMATION FOR SEQ ID NO: 137: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3065 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137: 
CGGTGAAGGT CCTGAAGAAT TTCCAGATTC CTGAGTATCA TTGGAGGAGA CAGATAACCT 60 
GTCGTCAGGT AACGATGGTG TATATGCAAC AGAAATGGGT GTTCCTGGAG ACGCGTCTTT 120 
TCCCGAGAGC GGCACCGCAA CTCTCCCGCG GTGACTGTGA CTGGAGGAGT CCTGCATCCA 180 
TGGAGCAAAC CGAAGGCGTG AGTACAGAAT GTGCTAAGGC CATCAAGCCT ATTGATGGGA 240 
AGTCAGTCCA TCAAATTTGT TCTGGGCAGG TGATACTCAG TTTAAGCACC GCTGTGAAGG 300 
AGTTGATAGA AAATAGTGTA GATGCTGGTG CT ACT ACT AT TGATCTAAGG CTTAAAGACT 360 
ATGGGGTGGA CCTCATTGAA GTTTCAGACA ATGGATGTGG GGTAGAAGAA GAAAACTTTG 420 
AAGGTCTAGC TCTGAAACAT CACACATCTA AGATTCAAGA GTTTGCCGAC CTCACGCAGG 480 
TTGAAACTTT CGGCTTTCGG GGGGAAGCTC TGAGCTCTCT GTGTGCACTA AGTGATGTCA 540 
CTATATCTAC CTGCCACGGG TCTGCAAGCG TTGGGACTCG ACTGGTGTTT GACCATAATG 600 
GGAAAATCAC CCAGAAAACT CCCTACCCCC GACCTAAAGG AACCACAGTC AGTGTGCAGC 660 
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ACTTATTTTA TACACTACCC GTGCGTTACA AAGAGTTTCA GAGGAACATT AAAAAGGAGT 720 
ATTCCAAAAT GGTGCAGGTC TTACAGGCGT ACTGTATCAT CTCAGCAGGC GTCCGTGTAA 780 
GCTGCACTAA TCAGCTCGGA CAGGGGAAGC GGCACGCTGT GGTGTGCACA AGCGGCACGT 840 
CTGGCATGAA GGAAAATATC GGGTCTGTGT TTGGCCAGAA GCAGTTGCAA AGCCTCATTC 900 
CTTTTGTTCA GCTGCCCCCT AGTGACGCTG TGTGTGAAGA GTACGGCCTG AGCACTTCAG 960 
GACGCCACAA AACCTTTTCT ACGTTTTCGG GCTTCATTTC ACAGTGCACG CACGGCGCCG 1020 
GGAGGAGTGC AACAGACAGG CAGTTTTTCT TCATCAATCA GAGGCCCTGT GACCCAGCAA 1080 
AGGTCTCTAA GCTTGTCAAT GAGGTTTATC ACATGTATAA CCGGCATCAG TACCCATTTG 1140 
TCGTCCTTAA CGTTTCCGTT GACTCAGAAT GTGTGGATAT TAATGTAACT CCAGATAAAA 1200 
GGCAAATTCT ACTACAAGAA GAGAAGCTAT TGCTGGCCGT TTTAAAGACC TCCTTGATAG 1260 
GAATGTTTGA CAGTGATGCA AACAAGCTTA ATGTCAACCA GCAGCCACTG CTAGATGTTG 1320 
AAGGTAACTT AGTAAAGTCG CATACTGCAG AACTAGAAAA GCCTGTGCCA GGAAAGCAAG 1380 
ATAACTCTCC TTCACTGAAG AGCACAGCAG ACGAGAAAAG GGTAGCATCC ATCTCCAGGC 1440 
TGAGAGAGGC CTTTTCTCTT CATCCTACTA AAGAGATCAA GTCTAGGGGT CCAGAGACTG 1500 
CTGAACTGAC ACGGAGTTTT CCAAGTGAGA AAAGGGGCGT GTTATCCTCT TATCCTTCAG 1560 
ACGTCATCTC TTACAGAGGC CTCCGTGGCT CGCAGGACAA ATTGGTGAGT CCCACGGACA 1620 
GCCCTGGTGA CTGTATGGAC AGAGAGAAAA TAGAAAAAGA CTCAGGGCTC AGCAGCACCT 1680 
CAGCTGGCTC TGAGGAAGAG TTCAGCACCC CAGAAGTGGC CAGTAGCTTT AGCAGTGACT 1740 
ATAACGTGAG CTCCCTAGAA GACAGACCTT CTCAGGAAAC CATAAACTGT GGTGACCTGC 1800 
TGCCGTCCTC CAGGTACAGG ACAGTCCTTG AAGCCAGAAG ACCATGGATA TCAATGCAAA 1860 
GCXCTACCTC TAGCTCGTCT GTCACCCACA AATGCCAAGC GCTTCAAGAC AGAGGAAGAC 1920 
CCTCAAATGT CAACATATCT CAAAGATTGC CTGGTCCTCA GAGCACCTCA GCAGCTGAGG 1980 
TCGATGTAGC CATAAAAATG AATAAGAGAT CGTGCTCCTC GAGTTCTCTA GCTAAGCGAA 2040 
TGJVAGCAGTT ACAGCACCTA AAGGCGCAGA ACAAACATGA ACTGAGTTAC AGAAAATTTA 2100 
GGGCCAAGAT TTGCCCTGGA GAAAACCAAG CAGCAGAAGA TGAACTCAGA AAAGAGATTA 2160 
GTAAATCGAT GTTTGCAGAG ATGGAGATCT TGGGTCAGTT TAACCTGGGA TTTATAGTAA 2^-0 
CCAAACTGAA AGAGGACCTC TTCCTGGTGG ACCAGCATGC TGCGGATGAG AAGTACAACT SO 
TTGAGATGCT GCAGCAGCAC ACGGTGCTCC AGGCGCAGAG GCTCATCACG TGGGTGCACf 340 
CAGGCTTCAG AGTTCCCAGA CCCCAGACTC TGAACTTAAC TGCTGTCAAT GAAGCTGTAC 2400 
TGATAGAAAA TCTGGAAATA TTCAGAAAGA ATGGCTTTGA CTTTGTCATT GATGAGGATG 2460 
CTCCAGTCAC TGAAAGGGCT AAATTGATTT CCTTACCAAC TAGTAAAAAC TGGACCTTTG 2520 
GACCCCAAGA TATAGATGAA CTGATCTTTA TGTTAAGTGA CAGCCCTGGG GTCATGTGCC 2580 
GGCCCTCACG AGTCAGACAG ATGTTTGCTT CCAGAGCCTG TCGGAAGTCA GTGATGATTG 2640 
GAACGGCGCT CAATGCGAGC GAGATGAAGA AGCTCATCAC CCACATGGGT GAGATGGACC 2700 
ACCCCTGGAA CTGCCCCCAC GGCAGGCCAA CCATGAGGCA CGTTGCCAAT CTGGATGTCA 2760 
TCTCTCAGAA CTGACACACC CCTTGTAGCA TAGAGTTTAT TACAGATTGT TCGGTTCGCA 2820 
AAGAGAAGGT TTTAAGTAAT CTGATTATCG TTGTACAAAA ATTAGCATGC TGCTTTAATG 2880 
TACTGGATCC ATTTAAAAGC AGTGTTAAGG CAGGCATGAX GGAGTGTTCC TCTAGCTCAG 2940 
CTACTTGGGT GATCCGGTGG GAGCTCATGT GAGCCCAGGA CTTTGAGACC ACTCCGAGCC 3000 
ACATTCATGA GACTCAATTC AAGGACAAAA AAAAAAAGAT ATTTTTGAAG CCTTTTAAAA 3060 
AAAAA 3065 
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(2) INFORMATION FOR SEQ ID NO: 138: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 864 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 
<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138: 

Met Glu Gin Thr Glu Gly Val Ser Thr Glu Cye Ala Lys Ala He Lys 

15 10 15 

Pro He Asp Gly Lys Ser Val His Gin He Cys Ser Gly Gin Val He 

20 25 30 

Leu Ser Leu Ser Thr Ala Val Lys Glu Leu lie Glu Asn Ser Val Asp 

35 40 45 

Ala Gly Ala Thr Thr He Asp Leu Arg Leu Lys Asp Tyr Gly Val Asp 

50 55 60 

Leu He Glu Val Ser Asp Asn Gly Cys Gly Val Glu Glu Glu Asn Phe 
65 70 75 80 

Glu Gly Leu Ala Leu Lys His His Thr Ser Lys He Gin Glu Phe Ala 

85 90 95 

Asp Leu Thr Gin Val Glu Thr Phe Gly Phe Arg Gly Glu Ala Leu Ser 

100 105 110 

Ser Leu Cys Ala Leu Ser Asp Val Thr He Ser Thr Cys His Gly Ser 

115 120 125 

Ala Ser Val Gly Thr Arg Leu Val Phe Asp His Asn Gly Lys He Thr 

130 135 140 

Gin Lys Thr Pro Tyr Pro Arg Pro Lys Gly Thr Thr Val Ser Val Gin 
145 150 155 160 

His Leu Phe Tyr Thr Leu Pro Val Arg Tyr Lys Glu Phe Gin Arg Asn 

165 170 175 

He Lys Lys Glu Tyr Ser Lys Met Val Gin Val Leu Gin Ala Tyr Cys 

180 185 190 

He He Ser Ala Gly Val Arg Val Ser Cys Thr Asn Gin Leu Gly Gin 

195 200 205 

Gly Lys Arg His Ala Val Val Cys Thr Ser Gly Thr Ser Gly Met Lys 

210 215 220 

Glu Asn He Gly Ser Val Phe Gly Gin Lys Gin Leu Gin Ser Leu He 
225 230 235 240 

Pro Phe Val Gin Leu Pro Pro Ser Asp Ala Val Cys Glu Glu Tyr Gly 

245 250 255 

Leu Ser Thr Ser Gly Arg His Lys Thr Phe Ser Thr Phe Ser Gly Phe 

260 265 270 

He Ser Gin Cys Thr His Gly Ala Gly Arg Ser Ala Thr Asp Arg Gin 
275 280 285 
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Phe Phe Phe lie Asn Gin Arg Pro Cys Asp Pro Ala Lys Val Ser Lys 

290 295 300 

Leu Val Asn Glu Val Tyr His Met Tyr Asn Arg His Gin Tyr Pro Phe 
305 310 315 320 

Val Val Leu Asn Val Ser Val Asp Ser Glu Cys Val Asp lie Asn Val 

325 330 335 

Thr Pro Asp Lys Arg Gin lie Leu Leu Gin Glu Glu Lys Leu Leu Leu 

340 345 350 

Ala Val Leu Lys Thr Ser Leu lie Gly Met Phe Asp Ser Asp Ala Asn 

355 360 365 

Lys Leu Asn Val Asn Gin Gin Pro Leu Leu Asp Val Glu Gly Asn Leu 

370 375 380 

Val Lys Ser His Thr Ala Glu Leu Glu Lys Pro Val Pro Gly Lys Gin 
385 390 395 400 

Asp Asn Ser Pro Ser Leu Lys Ser Thr Ala Asp Glu Lys Arg Val Ala 

405 410 415 

Ser lie Ser Arg Leu Arg Glu Ala Phe Ser Leu His Pro Thr Lys Glu 

420 425 430 

lie Lys Ser Arg Gly Pro Glu Thr Ala Glu Leu Thr Arg Ser Phe Pro 

435 440 445 

Ser Glu Lys Arg Gly Val Leu Ser Ser Tyr Pro Ser Asp Val lie Ser 

450 455 460 

Tyr Arg Gly Leu Arg Gly Ser Gin Asp Lys Leu Val Ser Pro Thr Asp 
465 470 475 480 

Ser Pro Gly Asp Cys Met Asp Arg Glu Lys lie Glu Lys Asp Ser Gly 

485 490 495 

Leu Ser Ser Thr Ser Ala Gly Ser Glu Glu Glu Phe Ser Thr Pro Glu 

500 505 510 

Val Ala Ser Ser Phe Ser Ser Asp Tyr Asn Val Ser Ser Leu Glu Asp 

515 520 525 

Arg Pro Ser Gin Glu Thr lie Asn Cys Gly Asp Leu Leu Pro Ser Ser 

530 535 540 

Arg Tyr Arg Thr Val Leu Glu Ala Arg Arg Pro Trp lie Ser Met Gin 
545 550 555 560 

Ser Ser Thr Ser Ser Ser Ser Val Thr His Lys Cys Gin Ala Leu Gin 

565 570 575 

Asp Arg Gly Arg Pro Ser Asn Val Asn lie Ser Gin Arg Leu Pro Gly 

580 585 590 

Pro Gin Ser Thr Ser Ala Ala Glu Val Asp Val Ala lie Lys Met Asn 

595 600 605 

Lys Arg Ser Cys Ser Ser Ser Ser Leu Ala Lys Arg Met Lys Gin Leu 

610 615 620 

Gin His Leu Lys Ala Gin Asn Lys His Glu Leu Ser Tyr Arg Lys Phe 
625 630 635 640 
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Arg Ala Lys He Cys Pro Gly Glu Asn Gin Ala Ala Glu Asp Glu Leu 

645 650 655 

Arg Lys Glu He Ser Lys Ser Met Phe Ala Glu Met Glu He Leu Gly 

660 665 670 

Gin Phe Asn Leu Gly Phe He Val Thr Lys Leu Lys Glu Asp Leu Phe 

675 680 685 

Leu Val Asp Gin His Ala Ala Asp Glu Lys Tyr Asn Phe Glu Met Leu 

690 695 700 

Gin Gin His Thr Val Leu Gin Ala Gin Arg Leu He Thr Trp Val His 
705 710 715 720 

Thr Gly Phe Arg Val Pro Arg Pro Gin Thr Leu Asn Leu Thr Ala Val 

725 730 735 

Asn Glu Ala Val Leu He Glu Asn Leu Glu He Phe Arg Lys Asn Gly 

740 745 750 

Phe Asp Phe Val He Asp Glu Asp Ala Pro Val Thr Glu Arg Ala Lys 

755 760 765 

Leu He Ser Leu Pro Thr Ser Lys Asn Trp Thr Phe Gly Pro Gin Asp 

770 775 780 

He Asp Glu Leu He Phe Met Leu Ser Asp Ser Pro Gly Val Met Cys 
785 790 795 800 

Arg Pro Ser Arg Val Arg Gin Met Phe Ala Ser Arg Ala Cys Arg Lys 

805 810 815 

Ser Val Met He Gly Thr Ala Leu Asn Ala Ser Glu Met Lys Lys Leu 

820 825 830 

He Thr His Met Gly Glu Met Asp His Pro Trp Asn Cys Pro His Gly 

835 840 845 

Arg Pro Thr Met Arg His Val Ala Asn Leu Asp Val He Ser Gin Asn 
850 . 855 860 

(2) INFORMATION FOR SEQ ID NO: 139: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139 : 
CTTGATTCTA GAGCYTCNCC NCKRAANCC 29 



(2) INFORMATION FOR SEQ ID NO: 140: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: 
AGGTCGGAGC TCAARGARYT NGTNGANAA 29 

(2) INFORMATION FOR SEQ ID NO: 141: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: 
ACTTGTGGAT TTTGC 15 

(2) INFORMATION FOR SEQ ID NO: 142: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: 
ACTTGTGAAT TTTGC 15 

(2) INFORMATION FOR SEQ ID NO: 143: 
(X) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 143: 
TTCGGTGACA GATTTGTAAA TG 22 

(2) INFORMATION FOR SEQ ID NO: 144: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 144: 
TTTACGGAGC CCTGGC 16 
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(2) INFORMATION FOR SEQ ID NO: 145: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 145: 
TCACCATAAA AATAGTTTCC CG 22 

<2) INFORMATION FOR SEQ ID NO: 146: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 146: 
TCCTGGATCA TATTTTCTGA GC 22 

(2) INFORMATION FOR SEQ ID NO: 147: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 147: 
TTTCAGGTAT GTCCTGTTAC CC 22 

(2) INFORMATION FOR SEQ ID NO: 148: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 148: 
TGAGGCAGCT TTTAAGAAAC TC 22 
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WE CLAIM: 

1. A method of diagnosing cancer susceptibility in a subject 
comprising detecting a mutation in a mutL homolog gene or gene product in a 
tissue of the subject, the mutation being indicative of the subject's susceptibility 
to cancer. 

2. A method of identifying and classifying a DNA mismatch- 
repair-defective tumor comprising detecting in a tumor a mutation in a mutL 
homolog gene or gene product, the mutation being indicative of a defect in a 
mismatch repair system of the tumor. 

3* The method of claim 1 or claim 2 wherein the step of 
detecting comprises detecting a mutation in hMLHl or hPMSl. 

4. The method of claim 1 or claim 2 wherein the step of 
detecting mprises isolating nucleic acid from the subject; 

amplifying a segment of the mismatch repair gene or gene product 
from the isolated nucleic acid; 

comparing the amplified segment with an analogous segment of a 
wild-type allele of the mismatch repair gene or gene product; and 

detecting a difference between the amplified segment and the 
analogous segment, the difference being indicative of a mutation in the mismatch 
repair gene or gene product. 

5. The method of claim 4 wherein the step of detecting 
comprises determining whether the difference between the amplified segment and 
the analogous segment causes an affected phenotype. 
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6. The method of claim 4 wherein the difference in nucleotide 
sequence is selected from the group consisting of deletions of at least one 
nucleotide, insertions of at least one nucleotide, substitutions of at least one 
nucleotide and nucleotide rearrangements. 

7. The method of claim 4 wherein the step of amplifying 

comprises: 

reverse transcribing all or a portion of an RNA mismatch repair 
gene product to DNA; and 

amplifying a segment of the DNA produced by reverse transcription. 

8. The method of claim 4 wherein the step of amplifying 

comprises: 

selecting a pair of oligonucleotide primers capable of hybridizing to 
opposite strands of the mismatch repair gene, and in opposite orientation; 

performing a polymerase chain reaction utilizing the oligonucleotide 
primers such that nucleic acid of the mismatch repair chain intervening between 
the primers is amplified to become the amplified segment. 

9. The method of claim 8 wherein the intervening nucleic acid 
comprises at least a fragment of at least one exon of the mismatch repair gene. 

10. The method of claim 9 wherein the at least one exon has a 
nucleotide sequence selected from the group consisting of SEQ ID NOS: 25-43. 
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11. The method of claim 1 or claim 2 wherein the step of 
detecting comprises detecting a mutation in a mutL homolog mismatch repair 
protein. 

12. The method of claim 4 wherein the analogous segment of a 
wild-type allele of the mismatch repair gene or gene product comprises a wild- 
type hMLHl gene fragment having a unique portion of nucleotide sequence 
selected from the group consisting of: SEQ ID NOS: 6-24, 

13. The method of claim 8 wherein the step of selecting 
comprises selecting a pair of oligonucleotide primers, each primer of the pair 
comprising a nucleotide sequence selected from the group consisting of: SEQ ID 
NOS: 44-82. 

* 

14. The method of claim 8 wherein the intervening nucleotide 

sequence that is amplified comprises a unique portion of at least one nucleotide 
sequence selected from the group consisting of: SEQ ID NOS: 6-24. 

15. The method of claim 4 wherein the step of detecting a 
difference comprises detecting an hMLHl mutation characterized by a C to T 
transition mutation which produces a non-conservative amino acid substitution at 
position 44 of the hMLHl protein. 
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16. The method of claim 5 wherein the step of determining 

comprises: 

deriving a yeast strain that is deleted for its hMLHl gene; 
constructing a yeast homolog of the amplified segment including the 

difference; 

introducing the yeast homolog of the amplified segment into the 
yeast strain; and 

assaying the yeast strains ability to correct DNA mispairs. 

17. The method of claim 5 wherein the step of determining 
comprises producing an hMLHl protein including amino acids corresponding to 
the difference; and determining the extent of interaction between the hMLHl 
protein and an hPMSl protein compared to the degree of protein-protein 
interaction observed with wild-type hMLHl and hPMSl proteins. 

18. An isolated oligonucleotide primer capable of hybridizing 
specifically to all or a fragment of an hMLHl genomic sequence with a T m of 
greater than about 55-degrees° C Q . 

19. The isolated oligonucleotide primer of claim 18, the 
oligonucleotide primer being extendable by a DNA polymerase. 

20. The isolated oligonucleotide primer of claim 19, the 
oligonucleotide primer being capable of amplifying at least a portion of an 
hMLHl gene when used in a polymerase chain reaction including another primer. 
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21. The isolated oligonucleotide primer of claim 20, the 
oligonucleotide primer being at least 13 nucleotides in length. 

22. The isolated oligonucleotide primer of claim 21 comprising 
a nucleotide sequence selected from the group consisting of SEQ ID NOS: 44-82. 

23. An isolated nucleic acid including a segment having a 
nucleotide sequence substantially identical to a nucleotide sequence selected from 
the group consisting of SEQ ID NOS: 6-24. 

24. An isolated nucleic acid Including a segment having a 
nucleotide sequence substantially identical to a nucleotide sequence selected from 
the group consisting of SEQ ID NOS: 25-43. 

25. A unique fragment of the nucleic acid of claim 23 or 

claim 24. 

26. A method of detecting a mutation in a eukaryotic mutL 
homolog gene or fragment thereof comprising the steps of: 

isolating a eukaryotic mutL homolog gene or fragment thereof; and 
detecting a difference in activity between the isolated gene or 
fragment thereof and a wild-type allele of the gene or fragment thereof; the 
difference in activity being indicative of a mutation in the eukaryotic mutL 
homolog gene or fragment thereof. 
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27. A method of detecting a mutation in a eukaryotic mutL 
homolog gene or gene product comprising detecting a difference in activity 
between the gene or gene product and a wild-type version of the gene or gene 
product, the difference in activity being indicative of a mutation in the mutL 
homolog gene or gene product, 

28. The method of claim 26 wherein the eukaryotic mutL 
homolog gene or fragment thereof comprises a human gene or fragment thereof. 

29. The method of claim 27 wherein the mutL homolog gene or 
gene product comprises a human gene or gene product. 

* 30. The method of claim 28 or claim 29 wherein the gene 
comprises an hMLHl and the wild-type version of the gene comprises a wild-type 
allele of the hMLHl gene. 

31. The method of claim 28 or claim 29 wherein the gene 
comprises a hPMSl and the wild-type version of the gene comprises a wild-type 
allele of the hPMSl gene. 

32. The method of claim 30 wherein the wild-type version of the 
hMLHl gene comprises a nucleotide sequence substantially identical to a 
nucleotide sequence selected from the group consisting of SEQ ID NOS: 6-24, 
and unique fragments thereof. 
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33. The method of claim 30 wherein the wild-type version of the 
hMLHl gene encodes a polypeptide comprising an amino acid sequence selected 
from the group consisting of SEQ ID NO: 5 and unique fragments thereof, 

34. The method of claim 28 or claim 29 wherein the human 
mismatch repair gene product comprises a hMLHl protein or unique fragment 
thereof. 



35. The method of claim 34 wherein the hMLHl protein 
comprises an amino acid sequence selected from the group consisting of SEQ ID 
NO: 5 and unique fragments thereof. 

36. An isolated nucleotide or protein structure including a 
segment sequentially corresponding to a unique portion of a human mutL 
homolog gene or gene product. 

37. The nucleotide of claim 36 wherein the mutL homolog gene 
is hMLHl or hPMSL 

38. A pair of oligonucleotide primers capable of being used 
together in a polymerase chain reaction to amplify specifically a unique segment 
of a human mutL homolog gene. 

39. The pair of oligonucleotide primers of claim 38 wherein the 
mutL homolog gene is hMLHl or hPMSL 
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40. A probe comprising 

a nucleotide sequence capable of binding specifically by 
Watson/Crick pairing to complementary bases in a portion of a human mutL 
homolog gene; and 

a label-moiety attached to the sequence, wherein the label-moiety 
has a property selected from the group consisting of fluorescent, radioactive and 
chemiluminescent. 

41. The probe of claim 40 wherein the human mutL homolog 
gene is hMLHl or hPMSL 

42. An amplified quantity of a nucleotide including a segment 
corresponding to a unique portion of a human mutL homolog gene. 

43. The nucleotide of claim 42 wherein the human mutL 
he molog gene is hMLHl or hPMSL 

44. A pair of oligonucleotide primers capable of being employed 
in a polymerase chain reaction to amplify specifically a single exon from a human 
mutL homolog gene along with selected portions of flanking upstream and 
downstream introns. 



45. The primers of claim 44 wherein the human mutL homolog 
gene' is HMLHl or hPMSl. 
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46. The method of claim 1 wherein the detecting step comprises 
detecting a mutation in a portion of the individual's hMLHl gene, the portion 
being homologous to the DNA sequence including and between the two sets of 
underlined bases in Figure 3. 

47. The nucleotide of claim 37 wherein the segment is 
homologous to the DNA sequence including and between the two sets of 
underlined bases in Figure 3. 

48. An isolated nucleotide or protein structure including a 
segment substantially corresponding to a unique portion of a mouse mutL 
homolog gene or gene product. 

49. The structure of claim 48 wherein the segment substantially 
corresponds to a unique portion of a mammalian MLH1 or PMS1 gene or protein. 

50. Purified antibodies binding specifically to a MutL homolog 

protein. 

51. The antibodies of claim 50 wherein the antibodies are 
monoclonal antibodies. 



52. The antibodies of claim 50 wherein the MutL homolog 
protein is a human protein. 
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53. The antibodies of claim 52 wherein the protein is hMLHl 

or hPMSl. 

54. The antibodies of claim 50 wherein the MutL homolog 
protein is a mouse protein, 

55. The antibodies of claim 54 wherein the protein is mMLHl 

or mPMSl. 
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Guide for the isolation and characterization of mammalian PMS1 and MLH1 genes. 
Step 1 Design of degenerate oligonucleotide pools for PCR. 

Step 2 Reverse transcription and PCR on poly A+ selected mRNA isolated from 

human cells. 

Step 3 Cloning and sequencing of PCR generated fragments; identification of two 

gene fragments representing human PSM1 and MLH1. 

Step 4 Isolation of complete human and mouse PMS1 and MLH1 cDNA clones 

using the PCR fragments as probes. 

Step 5 Isolation of human and mouse, PMS1 and MLH1 genomic clones. 

Step 6 Chromosome positional mapping of the human and mouse, PMS1 and 

MLH1 genes by fluorescence in situ hybridization. 

Step 7 Using genomic and cDNA sequences to identify mutations in PMS1 and 

MLH1 genes from HNPCC Families. 

Step 8 Design targeting vectors to disrupt mouse PMS1 and MLH1 genes in ES 

cells; study mice deficient in mismatch repair. 
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MutL MJ>IOVl,Pf^UU>IOIAACCVVERrASVVK-*SEQ. ID NO. 1 

HexB KSHIIELPEMLANCIAACrVIERPASVCK" SEQ. ID NO: 2 

• * * * • • • 

Prifl 1 KFKHIEKLJJCETEKRCKOKEORV I PVXYI^SKTOIHO I HOI OVTTRITSGOVITOLTTAVK^"" SEQ. ID NO: 3 
1 lO 20 30 «0 50 *o" 

-40 SO 60 70 80 

MutiJL E LVEHS LOAG ATR VD I D I ERCC AKLI RI RDNGCC I KKEEtAlALAJUtATSKX ASLDDLEA 

***** «•« * ** * * * * « * ** * ** 

HcxB ELVENAXOACSSQI IIEI EEAGLKKVQITDNCHC IAHDEVEL\LRRHATSKIKNQADCTR 

*«* * *«* * *• * *# **# * * **** * 

Pmml EX^VDHSIDAMANQIEXXFXDYCl^IECSDNGTC 

" 70~ 60 90 100 110 120 

IOO 110 120 130 140 

MutL iisix;frcejuasissvsrltltsota£qae^ 

* *« «**> *** * * * ** **** * 

HexB iKTIXSFRC^Al^IASVSVLTrXTAVDCJLSHCTiajVARC^ « EVI PATS PVCTKVCVE 
« * * * * ** * * 

Pafil VQTL Cn^CEAl SSlXXZIAKI^VITTTSPPKAOKELYDW 

130 140 150 " 160 170 

X60 170 ISO 190 200 

MutL DLFYNTPARRK. FHRTJE^KTEFNHIDEIIIU^ 

<**• *>*-***> 41 ~* "~ ~" «« *"* * ** * ** * ** 

HexB DLFFKITAJUJ<.YH}3CK>A£t^HIIOIVNR^ • .TACTCQ 

******* * ** 

Pauil QLnmLPVRQKEF^KTITOQFTKCLTVIC^VA 

190 ~ 200 210 220 230 

220 230 240 2S0 260 

MUtL KERJRI/SATCXTXTFTJ^AL 

« * «* ******** * *** * 

HexB I^QAIAGXYGLVSAKKKlEIEHSOI^rEISCI^St^LTRAHKKyiSL. FIHCRYIKHFL 

* * * ** * * 

P»« 1 SSMRXH . XSSVFGAGGKKCEXJETVDLVX^I^PFXFrR^ 

250 260 270 280 290 

280 290 300 

MutL r KHAI RQACEDFXGA PQQPAFVLYLEXDPHQVDV 

1* * * ** * * *** ** 

H«x& LKRAXXXXSFGSKLMV CRFPLAVIHIHI DP YIADV 

* ** « ** 

P»a 1 CMS FCCGRNS KDRQFX YVNMRPVEYSTIXKCCKEVYKTFNN t f Q FFA VFLH LELPMS LI DV 
310 320 330 340 3S0 

310 320 330 340 3SO 360 

Mu tL KVHPAXKEVRraQSRfcVHp^X^ 

**** * *** — — ^ ^ ^ 

H*XB NVHPTTOEVRISICElC-IIKTLVSEAIANSi^ 

** * * * * 

P»fl 1 KVTPOKKV^IJL^ERAVID . XFKTTI^DYYKRQEI AI^ KRMCSQSEQQAQKRLKTEVFDD 
370 ' " 380 390 4O0 410 

450 460 470 480 490 500 

HexB SFPELEFFCCKHGTYLFA QCRDGLYII DQKAAQER VKYEE YRESI GHVDQSQQQUL 

* * ** * * **** * * * * 
Pnsl DFKXKE^GOFNLGFIIVTRKVDNKSDIJ^IVDOHASDEKYNFET KSQKLI 

710 720 730 740 7S0 

510 S20 530 5X0 5S0 560 

HexB VPYI FEFPADOAIJlIJCEPJiPLXXEVGVFLAXYGEHQFILREHPIWTtAEEEIESGlYEMCD 



Pasl IPQPVELSVI DELWI^KLPVFEKNGFKIJCIDEEEEFGSRVKIJLSIJ^SKCn'LFOI^DFN 
760 770 780 790 800 810 

570 580 590 6O0 610 

HexB MT.T.T.TKEVSIKXYRAELA IMMSCKRSIKANHRXDOKSARQLLYQLSQCONPY 

* * * * ** ** * * 

Pmsl ELIHLIKEDGGIjmi)NIRCSKIRSMFA>n^CRSSIMIGCT 

820 830 840 850 860 .870 

620 

H<sxB NCPHGRPVLVHFT 

******* * Figure 2 

Pas 1 NCPHGRPTKRHLM 

880 890 
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CTTGGCTCTTCTGGCGCCAAAATGTCGTTCGTGGCAGGGGTTATTCGGCGGCTGGACGAG 60 « 

KSFVAGVIRRLDE. 
ACAGTGGTGAACCGCATCGCGGCGGGGGAAGTTATCCAGCGGCCAGCTAATGCTATC AAA. 12 0 
TVVNRIAAGEVIQRPANAIK 
GAGATGATTGAGAAC TGTTTAGATGCAAAATCCACAAGTATTCAAGTCATTGTTAAAGAG 180 
EMXENCLDAKSTSIQVIVKE 
GGAGGCCTGAAGTTGATTCAGATCCAAGACAATGGCACCGGGATCAGGAAAGAAGATCTG 24 0 
GGL KLIGIQDNGTGI R K E D It 
GATATTGTATGTGAAAGGTTC ACTACTAGTAAACTGCAGTCCTTTGAGGATTTAGCCAGT 300 
DIVCERFTTSKX.QSFED&AS 
ATTTCTACCTAT GGCTTTCGAGGTGAGGC TTTC<5CCAGCATAAGCCATCTGGCTCATGTT 360 
I STYGFRGEAI#AS I SBVAHV 
ACTATTACAACGAAAACAGCTGATGGAAAGTGTGCATACAGAGCAAGTTACTCAGATGGA 420 
TITTKTADGKCAYRASYSDG 
AAACTGAAAGCCCCTCCTAAACCATGTGCTGGCAATCAAGGGACCCAGATCACGGTGGAG 480 
K L K A P PKPCAGNQGTQITVE 
GACCTTTTTTACAACATAGCCACGAGCAGAAAAGGTTTAAAAAATCCAAG1!CAAGAATAT 540 
DLFYNIATRRKALKNPSEEY 
GGG AAAATTTTGG AAGTTG TTGGC AGGT ATTC AGT ACAC AATG CAGGC ATT AGTTTCTGA € 0 0 
GKILEVVGRYSVHNAGISFS 
GTTAAAAAACAAGGAGAGACAGTAGCTGATGTTAGGACACTACCCAATGCCTCAACCGTG 660 
VKKQGETVADVRTI.PNASTV 
GACAAT ATTC GCTC CATC TTTGG AAATGC TGTTAGTCGAGAACTGATAG AAATTGGATGT 720 
DNIRSIFGNAVSRELIEIGC 
GAGGATAAAACCCTAGCCTTCAAAATGAATGGTTACATATCCA^ 780 
EDKTLAFKMNGYISNANYSV 
AAGAAGTGCATCTTCTTACTCTTCATCAACCATC 84 0 

KKCIFLLFINHRLVESTSLR 
AAAGCCATAGAAACAGTGTATGCAGC^^AT^ 900 
KAIETVYAAYLPKNTHPFItY 
CITCAGTTTAGAAATCAGTCCCCAGAATGTGG ATGTTAATGTG 9 60 

LSLEISPQNVDVNVHPTKHE 
GTTCA<^TTCC TGCACG AGG AG AGCATCCTTGGAGC GGGTGC AGCAGCACATCGAGAGCAAG 1020 
VHFLHEESXXjERVQQHXESK 

ctcctgg<x;tccaattcx:tcciaggatgtacttcac 1060 

X. I. G S N S SRMYFTQTLLPGL A 
GGCCCCTCIHSGGGAGATGGTTAAATCCACAACAAGT^ 114 0 

GPSGEMVKSTTS X# T *5 S S T S G 
AGTAGTGATAAGGTCTATCCCCAC(^GATGGTTCGTACAGATTCCCGGG 1200 
S SDKVY AH QMV RT D S R E Q K X, 
GATGCATTTCTCCAGCCTCTGAGCAAACCCCTC^CCAGTCAG^ 1260 
DAFLQPL.SKPL SSQPQAIVT 
GAGGATAAGACAGATAT«C?AGTGG<^G<^TAGGCAGC 1320 
EDKTDI SSGRARQQDEEM LE 
CTCCCAGCCCCTGCTGAAGTGGCTGCCAAAAATCAGAG^TTGGAGGGGGATACAACAAAG 1360 
LPAPAEVAAKHQSI4EGDTTK 
GGGACTTCAGAAATGTCAGAGAAGAGAGGACCTACTTCCAGCAACCCCAGAAAGAGACAT 1440 

gtsems'ekrgptssnprkrh 
cgggaagattctgatgtggaaatggtggaagatgattcccgaaaggaaatgactxs^ 1500 

redsdve .mv E DOS RK EMTAA 
TGTACCCCCCGGAGAAGC ' "CATTAACCTCACTAGTGTTTTGAGTCTCCAGGAAGAAATT 1560 
c tprrr: INLTSVLS^QEEI 
AATGAGCAGGGACATG JGTTCTCCGGGAGATGTTGCATAACCACTCCTTCGTGGGCTGT 1620 
NEQGHEVLREKLHNHSFVGC 
GTGAATCCTCAGTGGGCCTTGGCACAGCATCAAACCAAGTTATACCTTCTCAACACCACC 1680 
VNPQWAI#AQKQTKI*Y L L N T T 
AAGCTTAGTGAAGAACTGTTCTACC AGATACTCATTTATGATTTTGCCAATTTTGGTGTT 17 4 0 
KLSEELFYQII.XYDFANFGV 
CTCAGGTTATCGGAGCCAGCACCGCTCTTTGACCTTGCCATGCTTGCCTTAGATAGTCCA 1800 
LRLSEPAPLFDLAMLALDSP 
G AG AGTGGC TGG AC AG AGG AAG ATGGTC C CAAAG AAGG AC TT GC TGAATAC ATTGTTG AG 1860 
ESGWTEEDGPKEGt*AEYIVE 
TTTCTGAAGAAGAAGGCTGAGATGCTTGCAGACTATTTCTCTTTGGAAATTGATGAGGAA 1920 
F LKKKAEMZjADYF s l e I d e e 
GGGAACCTGATTGGATTACCCCTTCTGATTGACAACTATGTGCCCCCTTTGGAGGGACTG 1980 
GKIiXGIiPXiIfl DMYVP P L E G I* 
CCTATCTTCATTCTTCGACTAGCCACTGAGGTGAATTGGGACGAAGAAAAGGAATGTTTT 2040 
PIFILRliATEVNWDEEKECF 
GAAAGCCTCAGTAAAGAATGCGCTATGTTCTATTCCATCCGGAAGC AGTACATATCTGAG 2100 
ESLSKECAMFYSIRKQYISE 
GAGTCGACCCTCTCAGGCCAGCAGAGTGAAGTGCCTGGCTCCATTCCAAACTCCTGGAAG 2160 
ESTLSGQQSEVPGSI PNSWK 
TGGACTGTGGAACACATTGTCTATAAAGCCTTGCGCTCACACATTCTGCCTCCTAAACAT 2220 
WTVEKIVYKALRSHILPPKH 
TTCACAGAAGATGGAAATATCCTGCAGCTTGCTAACCTGCCTGATCTATACAAAGTCTTT 2280 
FTEDGNILQLANLPDX.YKVF 
GAGAGGTGTTAAATATGGTTATTTATGCACTGTGGGATGTGTTCTTCTTTCTCTGTATTC 234 0 

CGATACAAAGTGTTGTATCAAAGTGTGATATACAAAGTGTACCAACATAAGTGTTGGTAG 24 00 
CACTTAAGACTTATACTTGCCTTCTGATAGTATTCCTTTATACACAGTGGATTGATTATA 24 60 
AATAAAT 7vG AT GTGTCTTAAC AT A 24 84 



•SEQ. ID NO: 4 
-SEQ. ID NO: 5 



Human MLH1 cDNA 
Nucleotide and 
Protein Sequence 



Figure 3 
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#1: 18442 to 19109 (-21 to 116) 

TGGCTGGATGCTAAGCTACAGCTGAAGGAAGAACGTGAGCACG aaacactaaqqt 
aattagc TGAAGGCACTTCCGTTG^ 
gccaaa^^ 



SEQ 



SEQ. ID 
NO: 6 



AACGGGCCdCGTCACTCAATGGCG CGGACACGCCTCTTTCCCCGGGCAGAGGCAT 
GTACAGCGCATGCCCACAACGGCGpAGGCCGCCGGGTTCCCTACGTGCCATAAGC 
CTTCTCCTTTTC 



ID NO: 25 



#2: 19689 to 19688 (117 to 207) 



AAACACGTTAATGAGGCACTATTGTTTGTATTTGGAGTTTGTTATCATTGCTTGG 
CTCATATTA Aaatatatacattaaaataatta CAGACTGATAAATTATTTTCTGT 



SEQ. ID 
NO: 7 



AiatiCTC AAXG^XeC^jSGAIfig^^ 3l^6CGCTT'CAT'TCAA aaa€ca aaacc 1 1 1 ct 
ctgTTCTGGAAACTAGGCTTTTGCAGATGGGATTTTTTCACTGAAAAATTCAACA 
CCAACAATAAATATTTATTGAGTACCTATTATTTGCGGGGCACTGTTCAGGGGAT 
GTGTCAGT 

SEQ. ID NO: 26 



#3: 19687 to 19786 (208 to 306)' 

TTTCCTGGATTAATCAAGAAATGGAATTCAA aaaaatttaaaaaatqaqtaac AT 



SEQ. ID 
NO: 8 



ATAAGGTTTGGTACCTTTTACTTGp?TAAATGTATGCAAATCTGAGCAAACTTAAT 
GAACTTTAACTTTCAAAGACTG 

SEQ. ID NO: 27 



#4 18492 to 18421 (307 to 380) 

TGGAAGCAGCAGNCAGAT a a CG tt t ccc t tt qq t qa aa TG ACAG^TOGGTGACCCA 



AGAGATCTTGAGTACG 



SEQ. ID 
NO: 9 



TTGG cfcctaqatctcaqaqtaatc CTGTCTCAACACCAGTGTTATCTTTNNNGGC 



SEQ. ID NO: 28 
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#5: 18313 to 18179 (381 to 453) 



TTAATTTGTTATATTTTCTCATTAG^5 



CATGGGAGAglaaai 



SEQ. ID 
NO: 10 



aaattgcttctaagttttcagggtaataataaaatgaatttgcactagttaatgg 
aggtcccaagatatcctctaagcaKgataaatgactattggcttttnntggcatg 
gcagcctg 

SEQ. ID NO: 29 



#6: 18318 to 18317 (454 to 545) 

GCTTTTGCCAGGACCATCTT aaattttattttcaaatacttctata AATTTACAA 



SEQ. ID 
NO: 11 



GTGTTCTAGT actcatacattaaa caattactaaac TAGATGGTGAAAAGTAAAA 



SEQ . ID NO: 30 



#7: 19009 to 19135 (546 TO 588) 

CAGCAACCTATAAAAGTAGAGAGGAGTCTGTGTTTTGACGCAGCACCTTTAGCAT 

TTTTATTTGGATGAAGTTTCTGCTGGTTTATTTTTCTGTGGGTAAAATATTAATA 

GGCTGTATGGAGATATTTTTCTTTATATGTACCTTTGTTTAGATTACTCAACTCC 

ACTAATTTATTTAACTAAAAGGGGGCTCTGACAT ctaatatatatttttaac 

TCTTTTCTTACTCTTTTGTTTTTCTT^ 

&^GAAA^^ CT aataaaaataaaattata AT 
GTTT 

SEQ. ID NO: 31 



SEQ. ID 
NO: 12 



#8: 18197 to 18924 (589 TO 677) 

ATGTTTCAGT ctcaqccataacracaataaatcc TTGTGTCTTCTGCTGTTTGTTT 



TATAAAAAAATCTTTTACATTTATrATCTTGGTTTATCATT ccatcacattattt 
crgqaacc TTTCAAGATATTATGTGrGTTAAGAGTTTGCTTTAGTCAAATACACAG 
GCTTGTTTTATGCTTCAGATTTGTrAATGGAGTTCTTATTTCACGTAATCAACAC 
TTTCTAGGTGTATGTAATCTCCTA3ATTCTGTGGCGTGAATCATGTGTTCT 

SEQ. ID NO: 32 



SEQ. ID 
NO: 13 
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#9: 18765 to 18198 (678 TO 790) 



ACTGAGTAGGGTAGGTGGGTGAGTGGGTGGGTGGGTGGGTGGGTGGATGGATGGA 
TGGGAGGATGGGTGGGTGAATGGGTGAACAGACAAATGGATGGATGAATGGACAG 
GCACAGGAGGACCTCAAATGGACCAAGTCTTCGGGGCCCTCATTTCACAAAGTTA 
GTTTATGGGAAGGAACCTTGTGTTTTTAAATTCTGATTCTTTTGTAATGTTTGAG 
TOTTCAGTAOTTO^ 



laacacccac aa GG AATTTTATGGG A 



CCATGGAAAAATTTCTGAGTCCATAGGTTTGATTAAACATGGAGAAACCTCATGG 
CAAAGTTTGGTTTTATTGGGAAGC ZVTGTATA 

SEQ. ID NO: 3 3 



SEQ. ID 
NO: 14 



#10: 18305 to 18306 (791 TO 884) 

ATAGTGGGCTGGAAAGTGGCCACAGGTAAAGGTGCACCTTTCTTCCTGGGGATGT 
GATGTGCATATCACTACAGAAATGTCTTTCCTGAGGTGATGT cataactttatat 
aaatatacacc TGTGACCTCACCC^ 

ipgaafccM 

:cr€&sEa€cacrqct 

CtcctcTTTGAAAGAGATGAGCATbCTAATAGTACAATCAGAGTGAATCCCATAC 
ACCACTGGCAAAAGGATGTTCTGTCCCTTCTTACAGGTACAAGGCACAG 

SEQ. ID NO: 34 



SEQ. ID 
NO; 15 



#11: 18182 to 19041 (885 TO 1038) 



CTTACGCAAAGCTACACAGCTCTTAAGTAGCAGTGCCAATATTTGAACACACTCA 
GACTCGAGCCTGAGGTTTTGACCACTGTGTCATCTGGCCTCAAATCTTCTGGCCA 
CCACATACACCATATGT qggctttttctccccctCCC ACTATCTAAGGTAATTGT 



GAAGTATTACGTGGATAG 



SEQ. ID 
NO: 16 



SEQ. ID NO: 35 
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#12: 18579 to 18178 (1039 TO 1409) 

GAT aattatacctcatactaac TTCTTTCTTAGTACTGCTCCATTTGGGGACCTG 
TATATCTATACTTCTTATTCTGAGTCTCTCCACTATATATATATATATATATATA 




SEQ. ID 
NO: 17 



tqtaa€aaaac Tti 
CTGCCGACTTCCC 

SEQ. ID NO: 36 
The splice acceptor site is believed to have 21 T's. 



#13: 18420* to 18443 (1410 TO 1558) , 

CTGTGCTCCAGCACAGGTCATCCAGCTCTGTAGACCAGCGCAGAGAAGTTGCTTG 
CTCCCAAA tqcaacccacaaaatttqgc TAAGTTTAAAAACAAGAATAATAATGA 



aa 



GC^AA acfttiiiitqaaaatqaaaa aa 3CAGTCATGTTGTCAGAGTGGCACTACAGTT 



TTGATGGGC AAGCTCCTCTTCCTT rACTAACCCACAATAGCATC <aGCTTAAAGAC 
AATTTTTGATTGGGAGAAAAGGGA 3AAAATAATCTCTG 

SEQ. ID NO: 37 



SEQ. ID 
NO: 18 



#14: 19028 TO 18897 (1559 TO 1667) 

CAGTTTTCACCAGGAGGCTCAAATCAGGCNNCTTTGCTTACT taatatctctaqt 
tctggTGCCTGGTGCTTTGGTCAATGAAGTGGGGTTGGTAGGATTCTATTACTTA 



SEQ. ID 
NO: 19 



ad 



CAGCATGAAGGTAGTTGGGAAGGGCACAGGCTTTGGAGTCAGCACATGT 

SEQ. ID NO: 38 



Figure 4 A - 4 
SUBSTITUTE SHEET (RULE 26) 



WO 95/16793 



PCT/US94/14746 



8/24 



#15: 19025 to 18575 (1668 TO 1731) 

CCCCTGGTTGAAGCGTTGGAATCCCACTCTTTGGANNNNNNNAGATTGTGTTAGA 
CTGTTAACCAGATTCCACAGCCAGGCAGAACTATGTCTGTCTCATCCATGTGTCA 



TTTCCTTAAAGTCACTTCATTTTTATTTTCAGp3P§ 

TTTTCACTT c€aacattt 
TA 

SEQ. ID NO: 39 



ccaCCCCGCAAACAGTAGCTCTCCACTAAA 



SEQ. ID 
NO: 20 



-19 of splice acceptor site is A in some people. Others are 
heterozygous for A and G. GTCACTTC or CTCGCTTC (Polymorphism) . 



#16: 18184 to 18314 (1732 TO 1896) 



CATTTATGGTTTCTCACCTGCCATTCTGATAGTGGATTCTTGGGAATTCAGGCTT 

Cc 

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 

fr<^C^CC&T^ 
gCATTGGCTCA 

SEQ. ID NO: 40 



SEQ. ID 
NO: 21 



#17: 18429 to 18315 (1897 TO 1989) 

CAGATAGGAGGCACAAGGCCTG aqaaaaacactqaaaaaataaa ATTTGTTTAAA 
VTTTCI 



SEQ . ID 
NO: 22 



SEQ. ID NO: 41 



#18: 18444 to 18581 (1990 TO 2103) 

CTATATCTTCCCAGCAATATTCACAGTCCGTTTACAGTTTTAACGCCTAAAGTAT 
CACATTTCGTTTTTTAGCTT taaqtaqtctqtqatctcca TTTAGAATGAGi^TG 



SEQ, 
NO: 



ID 
23 



tggtg^ 
atgaaacttg 

SEQ. ID NO: 42 

Figure 4A - 5 
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#19: 18638 to 18637 (2104 TO 2271) 
cDNA. 



24 63 is end of 



AATCCTCTTGTGTTCAGGCCTGTGGATCCCTGAGAGGCTAGCCCACAAGATCCAC 
TTCAAAAGCCCTAGATAACACCAAGTCTTTCCAGACCCAGTGCACATCCCATCAG 
CCAG gacaccaatcrtatqttaa GATGCAAACAGGGAGGCTTATGACATCTAATGT 



SEQ 
NO 



TATTATGST^tATA 



SEQ. ID NO: 43 



Figure 4A - 6 
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MLH1 Human 
MLH1 Mouse 
MLH1 S. csrevisiae 
MutL S. Typhimurium 
MutL E. colt 
.HexB S. pneumoniae 
PMS1 S. cerevisiae 



Figure 6 
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Figure 14 
SEQ, ID NO: 137 
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